Learning:
A Working Definition
A
relatively permanent change in behavior, thoughts or feelings as a result
of experience. It
allows us to:
-
Use past experience to predict the future
-
To adapt to a rapidly changing environment.
-
To exert control over our environment.
Examples:
Which
of these represent learning (drawn from Rocklin, 1987)?
-
an infant
who stops thumb-sucking?
-
children being able to use language?
-
a patient who has a lobotomy and no longer manifests psychotic behavior?
-
a zinnia plant being pinched back and then growing more dense foliage
and flowers?
-
Zeb
throwing his cigarettes away after 30 years of smoking two packs
a day?
-
A computer that uses its first 100 tabulations to influence its
choice of opening moves in a chess game?
How do we adapt to the world? Through learning?
Learning
may be similar for lower animals and humans. If
we can understand learning we can understand much about behavior.
The Basic Issues (that we'll cover anyway):
-
Does
learning not involve thinking? (that's what early researchers
thought)
-
Are
the behaviors of ourselves and other animals governed by similar
laws of learning?
-
Can lower animals learn?
-
Do we learn in the same way as animals? (If
yes, and if we can understand these laws, the potential for improving
life is immeasurable.)
What governs most of our behavior?
-
Unconscious motives & innate drives?
-
Rational, logical problem solving?
-
Learned habits? What's rewarded?
-
How
about aggression: Is it learned or innate?
-
How about helplessness? Learned, innate? An excuse for laziness?
-
How about depression? Learned, innate? A way to manipulate others?
-
How
do we learn fears, guilt, pleasures?
-
Do
we learn more when the reward is greater?
-
The greater the reward, the more we enjoy the behavior, right?
|
Who is the acknowledged
founder of behaviorism?
ANSWER: John B. Watson
(Cartoons by Mark Parisi. Used by special permission. For many more, visit
his site.)
|
Learning
by Association (simplest form of learning)
Ivan
Pavlov, a turn of the century Russian physiologist, confronted
with the ability of his research dogs to adapt to laboratory conditions,
set out to understand learning and made important discoveries.
Classical or Pavlovian Conditioning
The
connectionist explanation: we learn to associate ideas or events because
they occur close together. Think of the adaptive value of learning when
one event predicts another.
Pavlov and his experiments: Digestion in dogs
Pavlov
example: Dogs reflexively salivate to food in mouth.
Food
is an unconditioned stimulus (UCS) to the salivation response.
Salivation
is unconditioned response (UCR) to the food (you don't need to
learn to salivate to food; it's an automatic response).
Pavlov
wondered whether you could take these natural associations between certain
stimuli and responses and use them to produce true learning.
He
noticed that his dogs would start to salivate before they were even given
any food, e.g., when the lab assistant simply opened the door to the room
in which dogs were housed (just like you might start to "salivate" when
the clock chimes 6 p.m.)
He
asked: What is going on here? Developed experiment to test some ideas.
Took
a neutral stimulus = a stimulus that can catch the learner's attention
but does not elicit the UCR (e.g., a bell)
At
baseline, the situation looks like:
Neutral
Stimulus (Rung Bell) ---->
No salivation
Unconditioned
Stimulus (Food; UCS)--> Salivation(UCR)
During
the conditioning (learning) trials:
Repeatedly
rang bell just before presenting food
Neutral
Stimulus (Rung Bell) + UCS (Food) ---> Salivation (UCR)
After
conditioning (learning):
Repeatedly
rang bell without showing dog any food & he finds:
Conditioned
Stimulus (Rung Bell; CS)----->Salivation (Conditioned Response; CR)
The
initially neutral stimulus (bell) became a learned (conditioned) elicitor
of the response
conditioned
stimulus (CS)
= a neutral stimulus which you have learned to give the response
to.
conditioned
response (CR) = similar to UCR
but is made to CS because oflearning
Examples
of classical conditioning in humans
Many
phobias:
-
fear of dental work
-
fear of dogs
-
fear of snakes
•
waking up right before your alarm goes off
•
feeling nauseous or dizzy upon entering a hospital
•
many ad campaigns use classical conditioning
• How
Tamara learned to:
never
eat raisins again
hate
the movie Gulliver's Travels (as a kid)
• Little
Albert's fears (Watson & Raynor)
How classical conditioning applies to our fear in movie "Jaws"
Important
Phenomena associated with Classical Conditioning
Extinction
(similar
to common sense idea of "forgetting")
After
the conditioning phase, the CR gets weaker and weaker when the CS is NOT
accompanied by the UCS (food). It gets
weaker NOT because the organism no longer remembers the UCS-CS connection.
It gets weaker
because CR is somehow inhibited.
How
does Pavlov know this? Spontaneous recovery
After
extinction trials, the dog will start to salivate again in response to
CS after only one pairing of UCS and CS.
Is there a best time relation between CS and UCS? Complicated. Just remember:
Law
of Contiguity: The closer the two are in time (with neutral stimulus preceding
presentation of UCS), the stronger the conditioning (generally speaking)
Generalization
In
generalization, the CR will be 'given' to similar stimuli. Examples:
-
salivation and tones vs. Bach
-
flowers and fear
Discrimination
In
discrimination, the CR is 'given' to some stimuli, but not others. Examples:
-
salivation and tones vs. Bach
- flowers and
fear
Remember
about Classical Conditioning
Learning
takes place fairly involuntarily. It is a more passive kind of learning.
What is learned
is an association between two stimuli (the neutral stimulus and unconditioned
stimulus).
IMPORTANT:
-
Know why Pavlov would call classical conditioning S-S learning
-
Know why Watson would call this S-R learning
-
Be able to explain how Rescorla's study supported the S-S interpretation
Operant Conditioning
The
area of Operant Conditioning owes heritage to Thorndike's Law of Effect
The
fundamental principle behind operant conditioning is that responses leading
to "pleasant" effects are more likely to be repeated in future than those
leading to "discomforting" ones.
Idea
of Operant Conditioning
You operate
on environment and your behavior has effects. The effects of consequences
of your voluntary behavior determine which behaviors or responses
you learn, retain, or forget (Skinner).
This principle doesn't
necessarily apply to reflexes and emotions (like classical conditioning
does).
Does it apply to
heart rate, blood pressure, love, anger, fear? Only if you can voluntarily
regulate them.
Differences with classical conditioning
In classical conditioning
the most important components are:
stimuli eliciting
---> responses
"Meat" ---> "dog salivating (involuntary)"
new stimuli (acting
as substitutes)
"bell" eliciting ---> salivation
In operant conditioning
the important components are:
behaviors ---> leading
to ---> consequences
boy gives girl a
flower ---> girl takes flower, ---> kisses boy and says "thank you"
The consequences
influence your future behavior.
cue ---> organism
responds ---> leading to consequence
What types of consequences are important?
If operant conditioning
(learning) is controlled by the consequences, then do
different consequences lead to different results? Yes!
Reinforcers
Reinforcers
by definition are "consequences" that strengthen behavior.
cue---->
organism responds ---> has reinforcing consequence
(a) boy gives flower--->
girl takes flower ---> girl kisses boy
Giving the flower
is reinforced by the girl kissing the boy.
(b) new toy ---> child throws tantrum ---> gets toy
The child throwing
tantrum is reinforced by the child getting the toy.
(c) homework ---> you study ---> good grade
Studying is reinforced
by you getting good grade.
These are all examples
of positive reinforcement, which
is a consequence that is pleasant for organism strengthens the behavior
that it followed. Note
that here a pleasant stimulus comes after the response.
Punishments
Punishments
by definition refer to consequences that decrease the behavior they follow
cue---->
organism responds ---> has punishing consequence
(a) boy gives flower--->
girl takes flower ---> girl slaps boy
Giving flower is
punished by the girl slapping the boy.
(b) new toy --->
child throws tantrum---> doesn't get toy
The child throwing
a tantrum is punished by not getting the toy.
(c) homework ---> you study ---> worst grade ever
Studying is punished
by you getting bad grade.
Punishment is not always good to use
There aer many negative
side effects of punishment, such as anger,
sadness and avoidance.
Can you think of
other side effects?
It is important
to remember that what is reinforcing for one person, might not be for
another. So don't fall into trap of assuming that you know what is reinforcing
for a person. The important thing is to find out what is reinforcing for
the person whose behavior you are trying to change. Then, use this (not
your own preconceived notions) to change person's behavior. The same idea
applies to punishment. What might be a punishment for you, could be a
reinforcer for another, or irrelevant to them.
Perverted example:
The flower guy might be a masochist who likes being slapped. In this case,
will his flower giving be reinforced or punished by the girl slapping
him?
A more normal example:
Kids are fighting. Mom sends older brother to his room. Mom's intent is
to punish the older brother. But, is this necessarily what she is doing?
Negative Reinforcers
All of the above is relatively easy. Now comes the part that is confusing
to some students:
Remember reinforcers
necessarily strengthen behavior. Some reinforcers follow the behavior
they are meant to strengthen. These are positive reinforcers. They are
called positive NOT because they feel good. Don't confuse positive with
a value judgment. They are called positive in this literature because
they are consequences that are ADDED TO (+) the behavior they are meant
to reinforce -- they FOLLOW that behavior.
Negative
reinforcement by
definition is behavior strengthened because its enactment REMOVES an annoying
stimulus for the organism.
Some reinforcers
operate because they remove or withdraw an aversive stimulus for the organism.
Examples:
Seat belt buzzer
---> you put seat belt on ---> turns off alarm.
Kid's tantrum --->
Mom gives in ---> turns off kid's tantrum
Husband beats wife
--> Wife says "I love you" --> husband stops beating
Dad nags to mow
lawn --> You mow lawn --> Dad stops nagging
In all of these
examples: Doing the behavior (seat belt on, Mom giving in, wife professing
love, you mowing lawn) REMOVES the aversive stimulus of (buzzer, tantrum,
beating, nagging). Unlike positive reinforcement, the aversive stimulus
typically precedes the behavior that it is
meant to reinforce. By turning off the stimulus, you are reinforced. It
is called negative reinforcement because doing the behavior SUBTRACTS
(-) away the aversive stimulus. It is also known as ESCAPE conditioning.
Important Phenomena associated with Operant Conditioning
Extinction
Extinction occurs
when the pperant behavior is weakened.
cue ---> organism
---> response ---> nothing
boy gives girl a
flower ---> girl takes flower ---> girl does nothing
new toy ---> child
throws tantrum ---> nothing
Eventually, boy
will stop giving flowers & kid will stop throwing tantrums
Generalization
Generalization
is when you are teaching an organism to display the behavior in response
to similar, but not identical, "consequences" that were used in initial
learning trials. We'll see this in the movie Harry.
Discrimination
Discrimination
is when you are teaching organism to display the behavior in response
to certain "consequences," and not others. That is, should behave only
when the discriminative stimulus appears. We'll also see this in the movie
Harry.
Other
examples of discrimination:
-
Stop
signs (vs. yield signs)
-
Three
knocks at door (vs. fewer or more)
-
Mom's
good (vs. bad mood)
Shaping
Shaping
occurs when successively closer approximations to the ultimately desired
behavior are reinforced (until the behavior you really want is established).
Sometimes
the behavior we want the organism to learn doesn't automatically appear
for us to reinforce. Instead, we need to use principles of reinforcement
to get the organism "there" by progressively reinforcing responses that
"look" like the response we want.
Examples of Shaping:
-
Potty
training
-
Animal
training
-
Examples
in the movie Harry.
Primary vs. Secondary Reinforcers
Primary
reinforcers: Person doesn't need to learn that consequence is reinforcing.
These consequences are innately reinforcing (like food, water, sex).
Secondary
reinforcers: Person first needs to learn that stimulus has reinforcing
value (e.g., money, compliments). See Gray on tokens.
Often
times, we'll start out using primary reinforcers and then switch to secondary.
Why switch? Primary are more "expensive" and more difficult to come up
with when needed!
Maximizing learning
With either reinforcers
or punishers, they must be:
-
contingent (consistent)
-
immediate (e.g.,
with punishment, it must be swift and certain)
-
right magnitude
Schedules
of reinforcement
Question: Is it
important to consistently reinforce, punish or extinguish behavior? Or
is it just as effective to occasionally do so?
Continuous reinforcement
This is simply when
you're reinforced every time you respond. For example:
A boy gives girl
a flower ---> the girl takes flower ---> kisses boy and says "thank you"
EVERY SINGLE TIME.
The consequences
of continuous reinforcement are:
-
very rapid learning
-
rapid responding
-
very rapid extinction
(forgetting)
Partial Reinforcement "Schedules": Examples of Ratio Schedules
Partial reinforcement
is when you're reinforced after "a number of the desired behaviors."
The consequences
of partial reinforcement differ depending upon exact schedule used. There
are two primary types of partial reinforcement: fixed and variable.
Fixed ratio partial reinforcement
In fixed ratio partial
reinforcement, the number of responses until reinforcement is delivered
is absolutely constant. You know exactly how often you have to show the
desired behavior before you get reinforced. For example:
-
Every 5th car
I sell, I get a bonus.
-
Every 10th customer
I sign up with MCI, I get paid.
-
Every 20th time
I bug my Mom for clothes, she buys them.
The consequences
of fixed ratio partial reinforcement are:
-
slow learning,
-
variable responding
-
slow extinction
Variable ratio partial reinforcement
In variable ratio
partial reinforcement, the number of response until reinforcement varies
randomly, very much like a slot machine. You don't have a clue how many
responses you need to emit in order to receive the desired consequence.
What if you didn't
know how many cars you had to sell, customers you had to sign, flowers
you had to give to get the desired effect? What would your behavior look
like? For example:
A boy gives a girl
a flower ---> the girl takes the flower ---> kisses boy says "thank you".
BUT the next time...
A boy gives a girl
a flower ---> the girl takes the flower ---> the girl does nothing.
Pop quizzes are
another good example.
The consequences
of variable ratio partial reinforcement are:
-
slower learning,
-
very fast responding,
-
very slow extinction
Partial Reinforcement "Schedules": Examples of Interval
Schedules
With interval schedules
you're reinforced only after a certain interval of time. The consequences
of this type of partial reinforcement
differ depending upon exact schedule used. There are two primary types
of interval schedules: fixed and variable.
Fixed
interval partial reinforcement
With fixed interval
partial reinforcement, you get reinforced after a certain period of time
has elapsed SINCE the last time you emitted the desired response. For
example:
-
Getting paid
every two weeks
-
Having an exam
every four weeks
The consequences of fixed interval partial reinforcement
are that you emit the response
right before you know you'll be reinforced and that's it (think of studying).
Variable
interval partial reinforcement
With variable interval
partial reinforcement, you get reinforced "on average" every x time intervals
for making a response. For example:
I say you'll be
tested on average every three weeks during 16-week period. You don't know
exactly when the test will happen. You could be tested:
Week 2, Week
7, Week 11, Week 13 or Week 14
This averages
out to being tested roughly every three weeks. But you don't know exactly
when the test is going to take place. Since you can't exactly
predict the test dates, it keeps you on your toes (with textbook in
hand!). This is better to use than a fixed interval of exactly
every 3 weeks. If you knew you'd be tested every third week, you'd only
study RIGHT BEFORE each test.
|

Ivan Pavlov
Pavlov was awarded the Nobel Prize in 1904 for research in what area?
ANSWER: the reflexes involved in digestion

Pavlov, dogs,
and a lot of saliva.

Click here for the Jaws theme music by John Williams
The fact that
Little Albert, after being conditioned to fear white rats, also feared
cotton balls, stuffed white animals and little pieces of white fur is
an example of what?
ANSWER: generalization
What kind of
animals did Edward Lee Thorndike train to escape from puzzle boxes?
ANSWER: cats
LINK:
Go to this site for a review of the principles of Classical Conditioning.
Edward Lee Thorndike
What do we call
the use of operant conditioning to control physiologically based problems
such as high blood pressure and migrane headaches?
ANSWER: biofeedback training
A
secondary reinforcer, like money, which can be saved and exchanged later
for another reinforcer is called what?
ANSWER: token
|
Watson
or Skinner's Radical Behavioral Perspective
Remember
they saw "mind" as a "black box." Thinking/cognition, to them, is irrelevant
to accounting for how we learn. All we need to know from W's or S's perspective
are the environmental contingencies,
from
which we can predict/control your behavior.
More "Liberal" Views of Learning
S
(stimulus) - O (organism) - R (response)
The organism interprets
(perceives, anticipates, "thinks" about) the stimulus before any response
is made. This
interpretation affects what they learn/how they behave.
Razran's, Volkova's,
Herrnstein's experiments nicely illustrate this. Study these!
Think (oops, I used
that word again) of Herrnstein's experiment with concept formation in
Shakespearean pigeons. They
obviously had to analyze the the slides in order to learn when to emit
the response and when not to.
"To
peck or not to peck, that is the question!"
The cognitive view
of learning often seems so obvious to us that it's difficult to
fathom learning without cognition!
The cognitive approach
is represented well in an area called "animal cognition." For some interesting
sites directly or indirectly pertinent to this, go to:
Back to Pavlov and Classical (S-S) Conditioning
What
is the cognitive perspective on classical conditioning?
The
organism has learned an expectancy.
For example, the baby learned to expect a "bee" (the unconditioned stimulus)
upon seeing flowers (the conditioned stimuli)
flowers
---> bee expectation ---> crying
music
---> shark expectation ----> fear
Back
to Operant (S-R) Conditioning
First remember what S-R theory said (in much simplified
form):
Basic Ideas
|
Example
|
| There
is a STIMULUS to behavior |
Cute
guy at party grabs your attention |
| You
give a RESPONSE |
You
flirt with him |
| There
is a CONSEQUENCE of your response |
He asks
you out for a date (probably: positive reinforcement for you) |
| CONSEQUENCE
strengthens your flirting behavior |
You
flirt again....and on and on until you get married |
So,
the consequence (reinforcer) stamps in the link between stimulus (cute
guy) and your response (flirting)
Now
consider Tolman's means-end idea (more cognitive explanation):
Basic Idea
|
Example
|
| There is a
STIMULUS to behavior |
Cute guy at
party grabs your attention |
| You THINK
of what to do: |
I could flirt
with him and he'd like that...and ask me out
You know
that flirting is a MEANS to getting asked out (the END)
|
| You're still
THINKING |
But, do I
really want to go out with him? |
| You DECIDE
what to do |
I guess I'll
flirt |
| You give a
RESPONSE |
You flirt
with him |
| There is a
CONSEQUENCE |
He asks you
out.... |
| etc. |
|
There
was a lot of resistance to this kind of idea. It seemed to imply so much
conscious AWARENESS and deliberation about our behavior. But, Tolman (and
others) didn't mean that we (or the rat or the pigeon) sat there consciously
deliberating. We do, however, have knowledge of which behaviors will lead
to which consequences; we also "decide" whether to engage in behaviors
depending upon whether we want that particular consequence at that particular
moment in time!
Other
evidence for the cognitive explanation:
You
should know:
-
What
is a negative contrast effect?
-
What
is a positive contrast effect?
-
How
are these explained from cognitive perspective?
-
Do
they occur in all organisms? Why? Why not?
-
How
do organisms learn cognitive maps?
-
What
actually is involved in place learning if it's NOT reward?
Overjustification
effect
Strict
S-R interpretation: larger reward should increase rate of behavior, right?
Wrong:
sometimes extrinsic rewards actually decrease rate of behavior
Why?
Lepper, Greene, & Nisbett Study
Do rewards necessarily increase the probability
of behavior?
Nursery
school children
Expected
reward - Knew they'd receive reward if they drew well with magic markers.
Unexpected
reward - Did not expect reward, but did receive it.
No
reward - Just played with magic markers. Didn't expect & didn't receive
reward.
The findings of the Lepper, Green and Nisbett study
Expected
reward condition led to a decrease in the behavior during post-test. Expected
reward decreases intrinsic interest value of the task. Expected
reward "makes" you feel externally rather than internally controlled.
Observational Learning
Back
to Tolman and Latent Learning
Tolman's place learning
research showed that animals learn (= acquire a new behavior) even if
they aren't rewarded for it. Rewards
don't seem to necessarily/always affect learning. Instead,
rewards sometimes affect whether we show the behavior we've learned.
How does this explain
Tolman's research on place learning? (Know this)
Is all voluntary
behavior learned this way? Do we have to respond to learn?
This
led to another Basic Idea: We can learn from observing the behavior
of others.
If this is true
you do not have to actually experience consequences or respond
in order to learn. For example, children will do not only what
they are reinforced for. They also will do what they see you do (and on
the TV), and your friends, and their friends, and the teachers, etc.,
etc. This
is known as Modeling or Observational Learning.
Bandura's
Bobo doll experiments with preschool kids
Rank Order the
following conditions, using 1-3
(1) = When given
the chance to display aggression, these kids would aggress the most
(2) = When given
the chance to display aggression, these kids would aggress the next most
(3) = When given
the chance to display aggression, these kids would aggress the least
After seeing an
adult model punished for aggressing?
After seeing an
adult model reinforced for aggressing?
After seeing the
adult model aggress (but wasn't reinforced or punished)?
Factors
Influencing Observational Learning?
Attention? Retention?
Motor Reproduction? Reinforcement/Incentive?
Which of these is
involved in actual learning/acquisition?
Which of these is
involved in whether what you've learned is displayed?
|

(Cartoons by Mark Parisi. Used by special permission. For many more,
visit his site.)
LINK:
This site teaches parents how observational learning (e.g., television)
is involved in teaching violence to children.
|