Kể Lại Văn Bản “trong Lòng Mẹ” Bằng Ngôi Kể Thứ Nhất

--- Bài mới hơn ---

  • Đề Kiểm Tra Văn (Tiết 41)
  • Ma Trận Đề Theo Chuẩn Kiến Thức Kỹ Năng Văn 8
  • Bài Soạn Ngữ Văn 8 Tiết 5
  • Ý Nghĩa Văn Bản Trong Lòng Mẹ
  • Ý Nghĩa Của Văn Bản Trong Lòng Mẹ
  • Ngày thơ ấu là một phần kỷ niệm không thể quên của mỗi con người, tác giả Nguyên Hồng là một trong số đó. Nó là những trang hồi ký chứa đầy nước mắt, thổn thức xót xa của một trái tim sớm phải nếm vị đắng cuộc đời, thiếu vắng tình thương và luôn khát khao tình yêu của mẹ. Điều đó đã được tác giả đúc kết lại trong tcá phẩm Những ngày thơ ấu, đặc biệt qua đoạn trích Trong lòng mẹ. Đoạn trích cho thấy một tâm hồn nhạy cảm , trong trắng, thơ ngây của một trái tim luôn tôn thờ người mẹ – bé Hồng. Chú bé Hồng sinh ra là kết quả của cuộc hôn nhân miễn cưỡng không tình yêu: Người bố nghiện ngập, người mẹ trẻ trung luôn khao khát tình yêu thương song đành chôn vùi tuổi xuân bên người chồng nghiện ngập. Cuối cùng người bố chết, người mẹ bỏ lại hai anh em Hồng để đi “tha hương cầu thực” trong sự ghẻ lạnh của họ hàng. Còn bà cô bên nội thì luôn gieo rắc vào đầu chú bé những rấp tâm tanh bẩn để chú cũng hoài nghi và ghét bỏ người mẹ của mình. Nhưng Hồng không những không ghét mẹ mà còn hiểu, thông cảm cho mẹ, lại còn càng ghét những cổ tục đã đày đọa mẹ. Những dấu ấn thành kiến của xã hội cay nghiệt ghi đậm dấu ấn trong tâm hồn non nớt của bé hồng, tạo nên những suy nghĩ già trước tuổi nhưng không thể nào xoá được những tình cảm kính yêu tôn thờ người mẹ. Mặc dù được sống trong một hoàn cảnh vật chất có phần sung sướng hơn những đứa trẻ lang thang không có mái nhà nhưng đối với bé hồng có lẽ hoàn cảnh ấy lại càng đáng thương hơn. Vốn dĩ đã không nhận đuợc một chút tình thương từ họ hàng, ấy vậy mà tình thương dành cho mẹ lại đang bị người khác tước đoạt mất. Nhưng vượt qua tất cả chính là tình yêu thương mẹ sâu sắc. Chiều hôm ấy khi vừa tan học, thoáng thấy bóng người ngồi trên xe kéo giống mẹ, chú liền gọi theo với giọng bối rối. Khi người mẹ quay đầu lại, Hồng liền sà vào lòng mẹ rồi hai mẹ con cùng sụt sùi hỏi thăm nhau. Khi ở trong lòng mẹ, chú chẳng còn mảy may đến những lời nói thâm độc của bà cô nữa. Cậu bé xúc động và thấy tủi hờn nên đã khóc rất trẻ con. Tiếng khóc vang vọng hơn không còn chất chứa nỗi niềm xót xa mà tràn trề niềm hạnh phúc. Giọt nước mắt hôm nay hoà chung giữa hai con người, là sự oà vỡ của cả hai tâm hồn mẹ – con làm nên tình mẫu tử. Câu chuyện trên đã để lại trong lòng người đọc một xúc cảm sâu sắc về tình mãu tử thiêng liêng trên thế giới này.

    --- Bài cũ hơn ---

  • Kể Lại Văn Bản Trong Lòng Mẹ Theo Ngôi Thứ 3
  • Đề Tài Phương Pháp Dạy Học Văn Bản Kịch Trong Chương Trình Ngữ Văn Thcs Theo Đặc Trưng Phương Thức Biểu Đạt
  • Bồi Dưỡng Môn Văn 8
  • Soạn Văn 8: Trong Lòng Mẹ
  • Soạn Văn Lớp 8 Bài Bố Cục Của Văn Bản Ngắn Gọn Hay Nhất
  • Kể Lại Văn Bản Trong Lòng Mẹ Theo Ngôi Thứ 3

    --- Bài mới hơn ---

  • Kể Lại Văn Bản “trong Lòng Mẹ” Bằng Ngôi Kể Thứ Nhất
  • Đề Kiểm Tra Văn (Tiết 41)
  • Ma Trận Đề Theo Chuẩn Kiến Thức Kỹ Năng Văn 8
  • Bài Soạn Ngữ Văn 8 Tiết 5
  • Ý Nghĩa Văn Bản Trong Lòng Mẹ
  • Văn của Nguyên hồng giàu cảm xúc,ngọt ngào và nhuần nhị rất hợp với những kỉ niệm về mẹ và về tuổi thơ.Phải chăng vì lẽ đó mà có ý kiến đã cho rằng “Nguyên hồng là nhà văn của phụ nhữ và nhi đồng”.Lời nhận định khái quát gần trọn sự nghiệp sáng tác của Nguyên hồng và đặc biệt đúng ở đoạn trích Trong lòng mẹ.

    Nguyên hồng đến với phụ nữ và trẻ em không phải như là một sự ngẫu nhiên.Ngay từ hai tập sách đầu tay,tiểu thuyết Bỉ vỏ và hồi ký Những ngày thơ ấu,nhà văn đã dụng công viết về những gian truân của họ.Từng có lúc nếm trải cuộc sống cùng cực ở xóm Cấm,Hải Phòng,Nguyên hồng hiểu những nỗi đắng cay kia như chính cuộc đời mình.Có thể nói những trang hồi ký về “ngày thơ ấu”là những trang văn đậm sâu kỷ niệm về tình mẫu tử,ở đó,tác giả đã để cho tình thương yêu vượt lên bao định kiến hằn học mà tỏa sáng.

    Trong lòng mẹ là một đoạn trích ngắn gọn gồm ba nhân vật:hai người phụ nữ và một cậu bé chúng tôi nhân vật khác nhau về tính cách nhưng đều đã hiện lên sinh động và đầy ấn tượng dưới ngòi bút của Nguyên hồng.Đoạn trích chứng tỏ sự am hiểu sắc của nhà văn về phụ nữ và trẻ em.Đặc biệt là sự nắm bắt cá tính và tâm lý.

    Nhân vật người cô được nhà văn xây dựng qua đối thoại.Nhân vật không được đặc tả nhưng tính cách cứ lộ dần qua lời đối thoại.Đó là một hình mẫu điển hình cho sự tàn nhẫn và lòng đố kỵ.Sự nhỏ nhoi của người cô làm bé Hồng đau nhói.Những lời nói lạnh lùng mà quái ác của người cô như được chắt ra từ bao cảnh đời ngang trái mà Nguyên hồng đã gặp.Cái ác có nhiều loại nhưng sự tàn nhẫn giả dối và đố kỵ thì ở đâu chẳng có những nét mặt giống nhân vật của Nguyên hồng.

    Hiểu sâu sắc về nhân vật phản diện nhưng tác giả còn tỏ ra tinh tế hơn nhiều khi lật mở những vẻ đẹp của tình yêu thương trong tâm hồn non nớt của bé Hồng.Tình yêu mẹ của bé Hồng vượt qua tất cả những dèm pha nanh nọc của bà cô.Ở trong em,kỷ niệm về mẹ,hình ảnh của mẹ bao giờ cũng tươi đẹp và trong sáng vô cùng.Dù có những lúc boăn khoăn nhưng cậu bé Hồng vẫn kiên trì một suy nghĩ đầy yêu thương về mẹ.Thế mới biết Nguyên Hồng hiểu và rất hiểu tuổi thơ.Ở đó,có thể nói trong tất cả chúng ta,cái anh sáng chiếu rọi lung linh và duy nhất đó là sự hiền hòa,yêu thương của lòng mẹ.Với hình ảnh bé Hồng,nhà văn dường như đã làm cho tình mẫu tử trên thế gian này thiêng liêng và ý nghĩa hơn gấp nhiều lần.

    Nhân vật kiệm lời nhất nhưng lại để lại cho chúng ta nhiều day dứt nhất chính là mẹ bé Hồng.Một người phụ nữ hẳn phải hiền hậu vô cùng.Chỉ cần xem cái nhìn nhân vật đón bé Hồng,ôm trọn cái sinh linh bé nhỏ vào lòng mà ta cảm thấy cái tình mẫu tử sâu nặng và cao quý biết bao.Không thể diễn tả hiết nỗi đau của người mẹ khi phải xa con và cũng không thể diễn tả hết niềm hạnh phúc trong ngày gặp lại con,nhà văn để cho người mẹ đáng thương im lặng.Ngày gặp lại con không biết có bao nhiêu cảm giác trong lòng người mẹ đang được ngân lên:vui có,buồn có,lo lắng,tủi hờn cũng có.Vậy sự im lặng đã trở thành sự diễn đạt tình tế nhất.

    Viết về phụ nữ,nhi đồng,viết về những kỷ niệm tuổi thơ không khó nhưng viết cho hay thì không dễ chút nào.Văn của Nguyên Hồng có nguồn mạch tự nhiên về đề tài người phụ nữ,về tuổi thơ.Cái nguồn mạch ấy chính là sự chắt lọc từ lòng yêu thương của Nguyên Hồng,từ những kỷ niệm tuổi thơ đẹp đẽ và sâu sắc về người mẹ kính yêu.

    --- Bài cũ hơn ---

  • Đề Tài Phương Pháp Dạy Học Văn Bản Kịch Trong Chương Trình Ngữ Văn Thcs Theo Đặc Trưng Phương Thức Biểu Đạt
  • Bồi Dưỡng Môn Văn 8
  • Soạn Văn 8: Trong Lòng Mẹ
  • Soạn Văn Lớp 8 Bài Bố Cục Của Văn Bản Ngắn Gọn Hay Nhất
  • Nêu Nội Dung Chính Của Văn Bản : +Trong Lòng Mẹ +Tức Nước Vỡ Bờ +Lão Hạc +Cô Bé Bán Diêm +Hai Cây Phong +Chiếc Lá Cuối Cùng +Đánh Nhau Với Cối Xay Gió
  • Sensory Evaluation Of Food: Principles And Practices, Second Edition (Food Science Text Series)

    --- Bài mới hơn ---

  • Oleg Tinkov “i’m Just Like Anyone Else” (Published In 2010; Full Text In English)
  • The Programmer’s Corner “” Various Text Files
  • Rừng Xà Nu Sách Giáo Khoa
  • Tuyển Tập Các Đề Văn Về Bài Rừng Xà Nu (Nguyễn Trung Thành) Có Lời Giải
  • Soạn Bài Rừng Xà Nu (Nguyễn Trung Thành)
  • Food Science Text Series

    The Food Science Text Series provides faculty with the leading teaching tools. The Editorial Board has outlined the most appropriate and complete content for each food science course in a typical food science program and has identified textbooks of the highest quality, written by the leading food science educators.

    Series Editor Dennis R. Heldman Editorial Board David A. Golden, Ph.D., Professor of Food Microbiology, Department of Food Science and Technology, University of Tennessee Richard W. Hartel, Professor of Food Engineering, Department of Food Science, University of Wisconsin Hildegarde Heymann, Professor of Food Sensory Science, Department of Food Science and Technology, University of California-Davis Joseph H. Hotchkiss, Professor, Institute of Food Science and Institute for Comparative and Environmental Toxicology, and Chair, Food Science Department, Cornell University Michael G. Johnson, Ph.D., Professor of Food Safety and Microbiology, Department of Food Science, University of Arkansas Joseph Montecalvo, Jr., Professor, Department of Food Science and Nutrition, California Polytechnic and State University-San Luis Obispo S. Suzanne Nielsen, Professor and Chair, Department of Food Science, Purdue University Juan L. Silva, Professor, Department of Food Science, Nutrition and Health Promotion, Mississippi State University

    For further volumes: http://www.springer.com/series/5999

    Harry T. Lawless · Hildegarde Heymann

    Sensory Evaluation of Food Principles and Practices Second Edition

    123

    Harry T. Lawless Department of Food Science Cornell University Stocking Hall, Room 106 14853 Ithaca NY, USA

    ISSN 1572-0330 ISBN 978-1-4419-6487-8 e-ISBN 978-1-4419-6488-5 DOI 10.1007/978-1-4419-6488-5 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2010932599 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expssion of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

    Preface

    vi

    Preface

    Preface

    vii

    Peryam, David Stevens, Herb Meiselman, Elaine Skinner, Howard Schutz, Howard Moskowitz, Rose Marie Pangborn, Beverley Kroll, W. Frank Shipe, Lawrence E. Marks, Joseph C. Stevens, Arye Dethmers, Barbara Klein, Ann Noble, Harold Hedrick, William C Stringer, Roger Boulton, Kay McMath, Joel van Wyk, and Roger Mitchell. Ithaca, New York Davis, California

    Harry T. Lawless Hildegarde Heymann

    Contents

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction and Overview . . . . . . . . . . . . . . . . . . . 1.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Measurement . . . . . . . . . . . . . . . . . . . . . 1.2 Historical Landmarks and the Three Classes of Test Methods . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Difference Testing . . . . . . . . . . . . . . . . . . . 1.2.2 Descriptive Analyses . . . . . . . . . . . . . . . . . 1.2.3 Affective Testing . . . . . . . . . . . . . . . . . . . 1.2.4 The Central Dogma-Analytic Versus Hedonic Tests . . . . . . . . . . . . . . . . . . . . . 1.3 Applications: Why Collect Sensory Data? . . . . . . . . . . . 1.3.1 Differences from Marketing Research Methods . . . 1.3.2 Differences from Traditional Product Grading Systems . . . . . . . . . . . . . . . . . . . 1.4 Summary and Conclusions . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Physiological and Psychological Foundations of Sensory Function . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 2.2 Classical Sensory Testing and Psychophysical Methods 2.2.1 Early Psychophysics . . . . . . . . . . . . . 2.2.2 The Classical Psychophysical Methods . . . . 2.2.3 Scaling and Magnitude Estimation . . . . . . 2.2.4 Critiques of Stevens . . . . . . . . . . . . . . 2.2.5 Empirical Versus Theory-Driven Functions . 2.2.6 Parallels of Psychophysics and Sensory Evaluation . . . . . . . . . . . . . . . . . . . 2.3 Anatomy and Physiology and Functions of Taste . . . . 2.3.1 Anatomy and Physiology . . . . . . . . . . . 2.3.2 Taste Perception: Qualities . . . . . . . . . . 2.3.3 Taste Perception: Adaptation and Mixture Interactions . . . . . . . . . . . . . . . . . . 2.3.4 Inpidual Differences and Taste Genetics . . 2.4 Anatomy and Physiology and Functions of Smell . . . . 2.4.1 Anatomy and Cellular Function . . . . . . . .

    1 1 1 3 4 5 6 7 8 10 13 15 16 17

    . . . . . . . .

    . . . . . . . .

    . . . . . . . .

    . . . . . . . .

    19 19 20 20 21 23 25 25

    . . . .

    . . . .

    . . . .

    . . . .

    26 27 27 30

    . . . .

    . . . .

    . . . .

    . . . .

    30 33 34 34 ix

    x

    Contents

    2.4.2 2.4.3 2.4.4 2.4.5

    Retronasal Smell . . . . . . . . . . . . . . Olfactory Sensitivity and Specific Anosmia Odor Qualities: Practical Systems . . . . . Functional Properties: Adaptation, Mixture Suppssion, and Release . . . . . . . . . . 2.5 Chemesthesis . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Qualities of Chemesthetic Experience . . . 2.5.2 Physiological Mechanisms of Chemesthesis 2.5.3 Chemical “Heat” . . . . . . . . . . . . . . 2.5.4 Other Irritative Sensations and Chemical Cooling . . . . . . . . . . . . . . . . . . . 2.5.5 Astringency . . . . . . . . . . . . . . . . . 2.5.6 Metallic Taste . . . . . . . . . . . . . . . . 2.6 Multi-modal Sensory Interactions . . . . . . . . . . . 2.6.1 Taste and Odor Interactions . . . . . . . . . 2.6.2 Irritation and Flavor . . . . . . . . . . . . . 2.6.3 Color-Flavor Interactions . . . . . . . . . . 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Principles of Good Practice . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . 3.2 The Sensory Testing Environment . . . . . . 3.2.1 Evaluation Area . . . . . . . . . . 3.2.2 Climate Control . . . . . . . . . . 3.3 Test Protocol Considerations . . . . . . . . 3.3.1 Sample Serving Procedures . . . . 3.3.2 Sample Size . . . . . . . . . . . . 3.3.3 Sample Serving Temperatures . . 3.3.4 Serving Containers . . . . . . . . 3.3.5 Carriers . . . . . . . . . . . . . . 3.3.6 Palate Cleansing . . . . . . . . . . 3.3.7 Swallowing and Expectoration . . 3.3.8 Instructions to Panelists . . . . . . 3.3.9 Randomization and Blind Labeling 3.4 Experimental Design . . . . . . . . . . . . . 3.4.1 Designing a Study . . . . . . . . . 3.4.2 Design and Treatment Structures . 3.5 Panelist Considerations . . . . . . . . . . . 3.5.1 Incentives . . . . . . . . . . . . . 3.5.2 Use of Human Subjects . . . . . . 3.5.3 Panelist Recruitment . . . . . . . 3.5.4 Panelist Selection and Screening . 3.5.5 Training of Panelists . . . . . . . 3.5.6 Panelist Performance Assessment . 3.6 Tabulation and Analysis . . . . . . . . . . . 3.6.1 Data Entry Systems . . . . . . . . 3.7 Conclusion . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . .

    36 37 38

    . . . . .

    . . . . .

    . . . . .

    . . . . .

    . . . . .

    39 41 41 42 43

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    44 45 46 47 47 49 49 50 50

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    57 57 58 59 62 63 63 63 64 64 65 65 66 66 66 66 66 69 72 72 73 74 74 75 75 75 75 76 76

    Contents

    xi

    4 Discrimination Testing . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Discrimination Testing . . . . . . . . . . . . . . . . . . . . . . 4.2 Types of Discrimination Tests . . . . . . . . . . . . . . . . . . 4.2.1 Paired Comparison Tests . . . . . . . . . . . . . . . 4.2.2 Triangle Tests . . . . . . . . . . . . . . . . . . . . . 4.2.3 Duo-Trio Tests . . . . . . . . . . . . . . . . . . . . 4.2.4 n-Alternative Forced Choice (n-AFC) Methods . . . 4.2.5 A-Not-A tests . . . . . . . . . . . . . . . . . . . . . 4.2.6 Sorting Methods . . . . . . . . . . . . . . . . . . . . 4.2.7 The ABX Discrimination Task . . . . . . . . . . . . 4.2.8 Dual-Standard Test . . . . . . . . . . . . . . . . . . 4.3 Reputed Strengths and Weaknesses of Discrimination Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Data Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Binomial Distributions and Tables . . . . . . . . . . 4.4.2 The Adjusted Chi-Square (χ2 ) Test . . . . . . . . . . 4.4.3 The Normal Distribution and the Z-Test on Proportion . . . . . . . . . . . . . . . . . . . . . 4.5 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 The Power of the Statistical Test . . . . . . . . . . . 4.5.2 Replications . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Warm-Up Effects . . . . . . . . . . . . . . . . . . . 4.5.4 Common Mistakes Made in the Interptation of Discrimination Tests . . . . . . . . . . . . . . . . Appendix: A Simple Approach to Handling the A, Not-A, and Same/Different Tests . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    79 79 80 80 83 84 85 85 87 88 88

    5 Similarity, Equivalence Testing, and Discrimination Theory . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Common Sense Approaches to Equivalence . . . . . . . . . . 5.3 Estimation of Sample Size and Test Power . . . . . . . . . . . 5.4 How Big of a Difference Is Important? Discriminator Theory . . . . . . . . . . . . . . . . . . . . . . 5.5 Tests for Significant Similarity . . . . . . . . . . . . . . . . . 5.6 The Two One-Sided Test Approach (TOST) and Interval Testing . . . . . . . . . . . . . . . . . . . . . . . 5.7 Claim Substantiation . . . . . . . . . . . . . . . . . . . . . . . 5.8 Models for Discrimination: Signal Detection Theory . . . . . . 5.8.1 The Problem . . . . . . . . . . . . . . . . . . . . . . 5.8.2 Experimental Setup . . . . . . . . . . . . . . . . . . 5.8.3 Assumptions and Theory . . . . . . . . . . . . . . . 5.8.4 An Example . . . . . . . . . . . . . . . . . . . . . . 5.8.5 A Connection to Paired Comparisons Results Through the ROC Curve . . . . . . . . . . . . . . . 5.9 Thurstonian Scaling . . . . . . . . . . . . . . . . . . . . . . . 5.9.1 The Theory and Formulae . . . . . . . . . . . . . . . 5.9.2 Extending Thurstone’s Model to Other Choice Tests . . . . . . . . . . . . . . . . . . . . . .

    101 101 103 104

    88 89 89 90 90 92 92 94 97 97 98 99

    105 108 110 111 111 112 112 113 114 116 116 116 118

    xii

    Contents

    5.10

    Extensions of the Thurstonian Methods, R-Index . . . 5.10.1 Short Cut Signal Detection Methods . . . . 5.10.2 An Example . . . . . . . . . . . . . . . . . 5.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . Appendix: Non-Central t-Test for Equivalence of Scaled Data References . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . .

    . . . . . .

    . . . . . .

    . . . . . .

    . . . . . .

    119 119 120 120 122 122

    6 Measurement of Sensory Thresholds . . . . . . . . . . . . . . . . . 6.1 Introduction: The Threshold Concept . . . . . . . . . . . . . . 6.2 Types of Thresholds: Definitions . . . . . . . . . . . . . . . . 6.3 Practical Methods: Ascending Forced Choice . . . . . . . . . . 6.4 Suggested Method for Taste/Odor/Flavor Detection Thresholds . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Ascending Forced-Choice Method of Limits . . . . . 6.4.2 Purpose of the Test . . . . . . . . . . . . . . . . . . 6.4.3 Preliminary Steps . . . . . . . . . . . . . . . . . . . 6.4.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . 6.4.5 Data Analysis . . . . . . . . . . . . . . . . . . . . . 6.4.6 Alternative Graphical Solution . . . . . . . . . . . . 6.4.7 Procedural Choices . . . . . . . . . . . . . . . . . . 6.5 Case Study/Worked Example . . . . . . . . . . . . . . . . . . 6.6 Other Forced Choice Methods . . . . . . . . . . . . . . . . . . 6.7 Probit Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Sensory Adaptation, Sequential Effects, and Variability . . . . 6.9 Alternative Methods: Rated Difference, Adaptive Procedures, Scaling . . . . . . . . . . . . . . . . . . . . . . . 6.9.1 Rated Difference from Control . . . . . . . . . . . . 6.9.2 Adaptive Procedures . . . . . . . . . . . . . . . . . 6.9.3 Scaling as an Alternative Measure of Sensitivity . . . 6.10 Dilution to Threshold Measures . . . . . . . . . . . . . . . . . 6.10.1 Odor Units and Gas-Chromatography Olfactometry (GCO) . . . . . . . . . . . . . . . . . 6.10.2 Scoville Units . . . . . . . . . . . . . . . . . . . . . 6.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: MTBE Threshold Data for Worked Example . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    125 125 127 128

    140 142 142 143 145

    7 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Some Theory . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Common Methods of Scaling . . . . . . . . . . . . . . . . 7.3.1 Category Scales . . . . . . . . . . . . . . . . . . 7.3.2 Line Scaling . . . . . . . . . . . . . . . . . . . . 7.3.3 Magnitude Estimation . . . . . . . . . . . . . . . 7.4 Recommended Practice and Practical Guidelines . . . . . . 7.4.1 Rule 1: Provide Sufficient Alternatives . . . . . . 7.4.2 Rule 2: The Attribute Must Be Understood . . . . 7.4.3 Rule 3: The Anchor Words Should Make Sense . 7.4.4 To Calibrate or Not to Calibrate . . . . . . . . . . 7.4.5 A Warning: Grading and Scoring are Not Scaling

    149 149 151 152 152 155 156 158 159 159 159 159 160

    . . . . . . . . . . . . .

    . . . . . . . . . . . . .

    129 129 129 130 131 131 131 133 133 134 136 136 137 137 138 140 140

    Contents

    xiii

    7.5

    Variations-Other Scaling Techniques . . . . . . . . . . . . . 7.5.1 Cross-Modal Matches and Variations on Magnitude Estimation . . . . . . . . . . . . . . . . . 7.5.2 Category-Ratio (Labeled Magnitude) Scales . . . . . 7.5.3 Adjustable Rating Techniques: Relative Scaling . . . 7.5.4 Ranking . . . . . . . . . . . . . . . . . . . . . . . . 7.5.5 Indirect Scales . . . . . . . . . . . . . . . . . . . . . 7.6 Comparing Methods: What is a Good Scale? . . . . . . . . . . 7.7 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.1 “Do People Make Relative Judgments” Should They See Their Previous Ratings? . . . . . . 7.7.2 Should Category Rating Scales Be Assigned Integer Numbers in Data Tabulation? Are They Interval Scales? . . . . . . . . . . . . . . . . . 7.7.3 Is Magnitude Estimation a Ratio Scale or Simply a Scale with Ratio Instructions? . . . . . . . 7.7.4 What is a “Valid” Scale? . . . . . . . . . . . . . . . 7.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 1: Derivation of Thurstonian-Scale Values for the 9-Point Scale . . . . . . . . . . . . . . . . . . . . . . . Appendix 2: Construction of Labeled Magnitude Scales . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Time-Intensity Methods . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 A Brief History . . . . . . . . . . . . . . . . . . . . . . . 8.3 Variations on the Method . . . . . . . . . . . . . . . . . . 8.3.1 Discrete or Discontinuous Sampling . . . . . . . 8.3.2 “Continuous” Tracking . . . . . . . . . . . . . . 8.3.3 Temporal Dominance Techniques . . . . . . . . . 8.4 Recommended Procedures . . . . . . . . . . . . . . . . . . 8.4.1 Steps in Conducting a Time-intensity Study . . . 8.4.2 Procedures . . . . . . . . . . . . . . . . . . . . . 8.4.3 Recommended Analysis . . . . . . . . . . . . . . 8.5 Data Analysis Options . . . . . . . . . . . . . . . . . . . . 8.5.1 General Approaches . . . . . . . . . . . . . . . . 8.5.2 Methods to Construct or Describe Average Curves 8.5.3 Case Study: Simple Geometric Description . . . 8.5.4 Analysis by Principal Components . . . . . . . . 8.6 Examples and Applications . . . . . . . . . . . . . . . . . 8.6.1 Taste and Flavor Sensation Tracking . . . . . . . 8.6.2 Trigeminal and Chemical/Tactile Sensations . . . 8.6.3 Taste and Odor Adaptation . . . . . . . . . . . . 8.6.4 Texture and Phase Change . . . . . . . . . . . . 8.6.5 Flavor Release . . . . . . . . . . . . . . . . . . . 8.6.6 Temporal Aspects of Hedonics . . . . . . . . . . 8.7 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . .

    160 160 162 164 165 166 167 168 168

    169 169 169 170 171 172 174 179 179 180 182 182 183 184 185 185 186 186 187 187 188 189 192 193 193 194 194 195 195 196 197 198 198

    xiv

    Contents

    9 Context Effects and Biases in Sensory Judgment . . . . . . . 9.1 Introduction: The Relative Nature of Human Judgment . 9.2 Simple Contrast Effects . . . . . . . . . . . . . . . . . 9.2.1 A Little Theory: Adaptation Level . . . . . . 9.2.2 Intensity Shifts . . . . . . . . . . . . . . . . 9.2.3 Quality Shifts . . . . . . . . . . . . . . . . . 9.2.4 Hedonic Shifts . . . . . . . . . . . . . . . . . 9.2.5 Explanations for Contrast . . . . . . . . . . . 9.3 Range and Frequency Effects . . . . . . . . . . . . . . 9.3.1 A Little More Theory: Parducci’s Range and Frequency Principles . . . . . . . . . . . 9.3.2 Range Effects . . . . . . . . . . . . . . . . . 9.3.3 Frequency Effects . . . . . . . . . . . . . . . 9.4 Biases . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Idiosyncratic Scale Usage and Number Bias . 9.4.2 Poulton’s Classifications . . . . . . . . . . . 9.4.3 Response Range Effects . . . . . . . . . . . . 9.4.4 The Centering Bias . . . . . . . . . . . . . . 9.5 Response Correlation and Response Restriction . . . . 9.5.1 Response Correlation . . . . . . . . . . . . . 9.5.2 “Dumping” Effects: Inflation Due to Response Restriction in Profiling . . . . . 9.5.3 Over-Partitioning . . . . . . . . . . . . . . . 9.6 Classical Psychological Errors and Other Biases . . . . 9.6.1 Errors in Structured Sequences: Anticipation and Habituation . . . . . . . . . . . . . . . . 9.6.2 The Stimulus Error . . . . . . . . . . . . . . 9.6.3 Positional or Order Bias . . . . . . . . . . . . 9.7 Antidotes . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.1 Avoid or Minimize . . . . . . . . . . . . . . 9.7.2 Randomization and Counterbalancing . . . . 9.7.3 Stabilization and Calibration . . . . . . . . . 9.7.4 Interptation . . . . . . . . . . . . . . . . . 9.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Descriptive Analysis . . . . . . . . . . . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . 10.2 Uses of Descriptive Analyses . . . . . . . . . . . . 10.3 Language and Descriptive Analysis . . . . . . . . . 10.4 Descriptive Analysis Techniques . . . . . . . . . . R 10.4.1 Flavor Profile . . . . . . . . . . . . . . R 10.4.2 Quantitative Descriptive Analysis . . . R . . . . . . . . . . . . . 10.4.3 Texture Profile R 10.4.4 Sensory Spectrum . . . . . . . . . . . . 10.5 Generic Descriptive Analysis . . . . . . . . . . . . 10.5.1 How to Do Descriptive Analysis in Three Easy Steps . . . . . . . . . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    203 203 206 206 207 207 208 209 210

    . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    210 210 211 212 212 213 214 215 216 216

    . . . . . . . . . . . .

    217 218 218

    . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    218 219 219 219 219 220 221 222 222 223

    . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    227 227 228 228 231 231 234 237 238 240

    . . . . . .

    240

    . . . . . . . . . .

    . . . . . . . . . .

    Contents

    xv

    10.5.2

    Studies Comparing Different Conventional Descriptive Analysis Techniques . . . . . . 10.6 Variations on the Theme . . . . . . . . . . . . . . . . 10.6.1 Using Attribute Citation Frequencies Instead of Attribute Intensities . . . . . . . . . . . 10.6.2 Deviation from Reference Method . . . . . 10.6.3 Intensity Variation Descriptive Method . . . 10.6.4 Combination of Descriptive Analysis and Time-Related Intensity Methods . . . . 10.6.5 Free Choice Profiling . . . . . . . . . . . . 10.6.6 Flash Profiling . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . .

    246 247

    . . . . . . . . . . . . . . .

    247 248 249

    . . . .

    . . . .

    . . . .

    . . . .

    . . . .

    249 249 252 253

    Texture Evaluation . . . . . . . . . . . . . . . . . . . . . . . 11.1 Texture Defined . . . . . . . . . . . . . . . . . . . . . 11.2 Visual, Auditory, and Tactile Texture . . . . . . . . . . 11.2.1 Visual Texture . . . . . . . . . . . . . . . . . 11.2.2 Auditory Texture . . . . . . . . . . . . . . . 11.2.3 Tactile Texture . . . . . . . . . . . . . . . . . 11.2.4 Tactile Hand Feel . . . . . . . . . . . . . . . 11.3 Sensory Texture Measurements . . . . . . . . . . . . . 11.3.1 Texture Profile Method . . . . . . . . . . . . 11.3.2 Other Sensory Texture Evaluation Techniques 11.3.3 Instrumental Texture Measurements and Sensory Correlations . . . . . . . . . . . 11.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    259 259 262 262 262 264 268 270 270 272

    . . . . . . . . . . . .

    274 276 276

    12

    Color and Appearance . . . . . . . . . . . . . . . . . . 12.1 Color and Appearance . . . . . . . . . . . . . . . 12.2 What Is Color? . . . . . . . . . . . . . . . . . . . 12.3 Vision . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Normal Human Color Vision Variations 12.3.2 Human Color Blindness . . . . . . . . . 12.4 Measurement of Appearance and Color Attributes 12.4.1 Appearance . . . . . . . . . . . . . . . 12.4.2 Visual Color Measurement . . . . . . . 12.5 Instrumental Color Measurement . . . . . . . . . 12.5.1 Munsell Color Solid . . . . . . . . . . . 12.5.2 Mathematical Color Systems . . . . . . 12.6 Conclusions . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . .

    . . . . . . . . . . . . . .

    . . . . . . . . . . . . . .

    . . . . . . . . . . . . . .

    . . . . . . . . . . . . . .

    . . . . . . . . . . . . . .

    . . . . . . . . . . . . . .

    283 283 284 285 286 286 286 286 289 293 293 294 299 299

    13

    Preference Testing . . . . . . . . . . . . . . . . . . 13.1 Introduction-Consumer Sensory Evaluation 13.2 Preference Tests: Overview . . . . . . . . . 13.2.1 The Basic Comparison . . . . . . 13.2.2 Variations . . . . . . . . . . . . . 13.2.3 Some Cautions . . . . . . . . . . 13.3 Simple Paired Preference Testing . . . . . .

    . . . . . . .

    . . . . . . .

    . . . . . . .

    . . . . . . .

    . . . . . . .

    . . . . . . .

    . . . . . . .

    303 303 305 305 305 306 306

    11

    . . . . . . .

    . . . . . . .

    . . . . . . .

    xvi

    Contents

    13.3.1 Recommended Procedure . . . . . . . . . . . . . . . 13.3.2 Statistical Basis . . . . . . . . . . . . . . . . . . . . 13.3.3 Worked Example . . . . . . . . . . . . . . . . . . . 13.3.4 Useful Statistical Approximations . . . . . . . . . . 13.3.5 The Special Case of Equivalence Testing . . . . . . . 13.4 Non-forced Preference . . . . . . . . . . . . . . . . . . . . . . 13.5 Replicated Preference Tests . . . . . . . . . . . . . . . . . . . 13.6 Replicated Non-forced Preference . . . . . . . . . . . . . . . . 13.7 Other Related Methods . . . . . . . . . . . . . . . . . . . . . 13.7.1 Ranking . . . . . . . . . . . . . . . . . . . . . . . . 13.7.2 Analysis of Ranked Data . . . . . . . . . . . . . . . 13.7.3 Best-Worst Scaling . . . . . . . . . . . . . . . . . . 13.7.4 Rated Degree of Preference and Other Options . . . . 13.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 1: Worked Example of the Ferris k-Visit Repeated Preference Test Including the No-Preference Option . . . . . . Appendix 2: The “Placebo” Preference Test . . . . . . . . . . . . . . . Appendix 3: Worked Example of Multinomial Approach to Analyzing Data with the No-Preference Option . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Acceptance Testing . . . . . . . . . . . . . . . . . . . . . . 14.1 Introduction: Scaled Liking Versus Choice . . . . . . 14.2 Hedonic Scaling: Quantification of Acceptability . . . 14.3 Recommended Procedure . . . . . . . . . . . . . . . 14.3.1 Steps . . . . . . . . . . . . . . . . . . . . . 14.3.2 Analysis . . . . . . . . . . . . . . . . . . . 14.3.3 Replication . . . . . . . . . . . . . . . . . 14.4 Other Acceptance Scales . . . . . . . . . . . . . . . . 14.4.1 Line Scales . . . . . . . . . . . . . . . . . 14.4.2 Magnitude Estimation . . . . . . . . . . . . 14.4.3 Labeled Magnitude Scales . . . . . . . . . 14.4.4 Pictorial Scales and Testing with Children . 14.4.5 Adjustable Scales . . . . . . . . . . . . . . 14.5 Just-About-Right Scales . . . . . . . . . . . . . . . . 14.5.1 Description . . . . . . . . . . . . . . . . . 14.5.2 Limitations . . . . . . . . . . . . . . . . . 14.5.3 Variations on Relative-to-Ideal Scaling . . . 14.5.4 Analysis of JAR Data . . . . . . . . . . . . 14.5.5 Penalty Analysis or “Mean Drop” . . . . . 14.5.6 Other Problems and Issues with JAR Scales 14.6 Behavioral and Context-Related Approaches . . . . . 14.6.1 Food Action Rating Scale (FACT) . . . . . 14.6.2 Appropriateness Scales . . . . . . . . . . . 14.6.3 Acceptor Set Size . . . . . . . . . . . . . . 14.6.4 Barter Scales . . . . . . . . . . . . . . . . 14.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . .

    306 307 308 309 310 311 313 313 315 315 316 317 318 320 320 321 322 323 325 325 326 327 327 328 328 328 328 330 331 332 333 334 334 335 336 336 339 340 340 341 341 342 343 343 344

    Contents

    xvii

    15

    16

    Consumer Field Tests and Questionnaire Design . . . . . . . . . . . 15.1 Sensory Testing Versus Concept Testing . . . . . . . . . . . . 15.2 Testing Scenarios: Central Location, Home Use . . . . . . . . 15.2.1 Purpose of the Tests . . . . . . . . . . . . . . . . . . 15.2.2 Consumer Models . . . . . . . . . . . . . . . . . . . 15.2.3 Central Location Tests . . . . . . . . . . . . . . . . 15.2.4 Home Use Tests (HUT) . . . . . . . . . . . . . . . . 15.3 Practical Matters in Conducting Consumer Field Tests . . . . . 15.3.1 Tasks and Test Design . . . . . . . . . . . . . . . . . 15.3.2 Sample Size and Stratification . . . . . . . . . . . . 15.3.3 Test Designs . . . . . . . . . . . . . . . . . . . . . . 15.4 Interacting with Field Services . . . . . . . . . . . . . . . . . 15.4.1 Choosing Agencies, Communication, and Test Specifications . . . . . . . . . . . . . . . . 15.4.2 Incidence, Cost, and Recruitment . . . . . . . . . . . 15.4.3 Some Tips: Do’s and Don’ts . . . . . . . . . . . . . 15.4.4 Steps in Testing with Research Suppliers . . . . . . . 15.5 Questionnaire Design . . . . . . . . . . . . . . . . . . . . . . 15.5.1 Types of Interviews . . . . . . . . . . . . . . . . . . 15.5.2 Questionnaire Flow: Order of Questions . . . . . . . 15.5.3 Interviewing . . . . . . . . . . . . . . . . . . . . . . 15.6 Rules of Thumb for Constructing Questions . . . . . . . . . . 15.6.1 General Principles . . . . . . . . . . . . . . . . . . . 15.6.2 Brevity . . . . . . . . . . . . . . . . . . . . . . . . . 15.6.3 Use Plain Language . . . . . . . . . . . . . . . . . . 15.6.4 Accessibility of the Information . . . . . . . . . . . 15.6.5 Avoid Vague Questions . . . . . . . . . . . . . . . . 15.6.6 Check for Overlap and Completeness . . . . . . . . . 15.6.7 Do Not Lead the Respondent . . . . . . . . . . . . . 15.6.8 Avoid Ambiguity and Double Questions . . . . . . . 15.6.9 Be Careful in Wording: Present Both Alternatives . . 15.6.10 Beware of Halos and Horns . . . . . . . . . . . . . . 15.6.11 Pre-test . . . . . . . . . . . . . . . . . . . . . . . . 15.7 Other Useful Questions: Satisfaction, Agreement, and Open-Ended Questions . . . . . . . . . . . . . . . . . . . 15.7.1 Satisfaction . . . . . . . . . . . . . . . . . . . . . . 15.7.2 Likert (Agree-Disagree) Scales . . . . . . . . . . . . 15.7.3 Open-Ended Questions . . . . . . . . . . . . . . . . 15.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 1: Sample Test Specification Sheet . . . . . . . . . . . . . . Appendix 2: Sample Screening Questionnaire . . . . . . . . . . . . . . Appendix 3: Sample Product Questionnaire . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qualitative Consumer Research Methods . . . . . . . 16.1 Introduction . . . . . . . . . . . . . . . . . . . 16.1.1 Resources, Definitions, and Objectives 16.1.2 Styles of Qualitative Research . . . . 16.1.3 Other Qualitative Techniques . . . . .

    . . . . .

    . . . . .

    . . . . .

    . . . . .

    . . . . .

    . . . . .

    . . . . .

    . . . . .

    349 349 351 351 352 353 354 355 355 355 356 358 358 359 360 360 362 362 362 363 364 364 364 364 365 365 365 365 366 366 366 366 367 367 367 367 368 370 371 374 378 379 380 380 380 382

    xviii

    Contents

    16.2

    Characteristics of Focus Groups . . . . . . . . . . . . . . . . . 383 16.2.1 Advantages . . . . . . . . . . . . . . . . . . . . . . 383 16.2.2 Key Requirements . . . . . . . . . . . . . . . . . . . 384 16.2.3 Reliability and Validity . . . . . . . . . . . . . . . . 384 16.3 Using Focus Groups in Sensory Evaluation . . . . . . . . . . . 385 16.4 Examples, Case Studies . . . . . . . . . . . . . . . . . . . . . 386 16.4.1 Case Study 1: Qualitative Research Before Conjoint Measurement in New Product Development 387 16.4.2 Case Study 2: Nutritional and Health Beliefs About Salt 387 16.5 Conducting Focus Group Studies . . . . . . . . . . . . . . . . 388 16.5.1 A Quick Overview . . . . . . . . . . . . . . . . . . 388 16.5.2 A Key Requirement: Developing Good Questions . . 389 16.5.3 The Discussion Guide and Phases of the Group Interview . . . . . . . . . . . . . . . . 390 16.5.4 Participant Requirements, Timing, Recording . . . . 391 16.6 Issues in Moderating . . . . . . . . . . . . . . . . . . . . . . . 392 16.6.1 Moderating Skills . . . . . . . . . . . . . . . . . . . 392 16.6.2 Basic Principles: Nondirection, Full Participation, and Coverage of Issues . . . . . . . . . 393 16.6.3 Assistant Moderators and Co-moderators . . . . . . . 394 16.6.4 Debriefing: Avoiding Selective Listening and Premature Conclusions . . . . . . . . . . . . . . 395 16.7 Analysis and Reporting . . . . . . . . . . . . . . . . . . . . . 395 16.7.1 General Principles . . . . . . . . . . . . . . . . . . . 395 16.7.2 Suggested Method (“Sorting/Clustering Approach”), also Called Classical Transcript Analysis . . . . . . . . . . . . . . . . . . . . . . . . 396 16.7.3 Report Format . . . . . . . . . . . . . . . . . . . . . 397 16.8 Alternative Procedures and Variations of the Group Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 16.8.1 Groups of Children, Telephone Interviews, Internet-Based Groups . . . . . . . . . . . . . . . . 398 16.8.2 Alternatives to Traditional Questioning . . . . . . . . 399 16.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 Appendix: Sample Report Group Report . . . . . . . . . . . . . . . . . 402 Boil-in-bag Pasta Project Followup Groups . . . . . . . . . . . . . . . 402 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 17 Quality Control and Shelf-Life (Stability) Testing . 17.1 Introduction: Objectives and Challenges . . . 17.2 A Quick Look at Traditional Quality Control . 17.3 Methods for Sensory QC . . . . . . . . . . . 17.3.1 Cuttings: A Bad Example . . . . . . 17.3.2 In-Out (Pass/Fail) System . . . . . 17.3.3 Difference from Control Ratings . . 17.3.4 Quality Ratings with Diagnostics . . 17.3.5 Descriptive Analysis . . . . . . . . 17.3.6 A Hybrid Approach: Quality Ratings with Diagnostics . . . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    407 408 409 409 409 410 411 412 413

    . . . . . . . . .

    414

    Contents

    xix

    17.3.7 The Multiple Standards Difference Test . . . . . . . Recommended Procedure: Difference Scoring with Key Attribute Scales . . . . . . . . . . . . . . . . . . . . . . . 17.5 The Importance of Good Practice . . . . . . . . . . . . . . . . 17.6 Historical Footnote: Expert Judges and Quality Scoring . . . . 17.6.1 Standardized Commodities . . . . . . . . . . . . . . 17.6.2 Example 1: Dairy Product Judging . . . . . . . . . . 17.6.3 Example 2: Wine Scoring . . . . . . . . . . . . . . . 17.7 Program Requirements and Program Development . . . . . . . 17.7.1 Desired Features of a Sensory QC System . . . . . . 17.7.2 Program Development and Management Issues . . . 17.7.3 The Problem of Low Incidence . . . . . . . . . . . . 17.8 Shelf-Life Testing . . . . . . . . . . . . . . . . . . . . . . . . 17.8.1 Basic Considerations . . . . . . . . . . . . . . . . . 17.8.2 Cutoff Point . . . . . . . . . . . . . . . . . . . . . . 17.8.3 Test Designs . . . . . . . . . . . . . . . . . . . . . . 17.8.4 Survival Analysis and Hazard Functions . . . . . . . 17.8.5 Accelerated Storage . . . . . . . . . . . . . . . . . . 17.9 Summary and Conclusions . . . . . . . . . . . . . . . . . . . Appendix 1: Sample Screening Tests for Sensory Quality Judges . . . . Appendix 2: Survival/Failure Estimates from a Series of Batches with Known Failure Times . . . . . . . . . . . . . . Appendix 3: Arrhenius Equation and Q10 Modeling . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    414

    17.4

    18

    19

    415 417 419 419 419 420 422 422 423 424 424 424 426 426 427 428 428 429 429 430 431

    Data Relationships and Multivariate Applications . . . . . . . . . . 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Overview of Multivariate Statistical Techniques . . . . . . . . 18.2.1 Principal Component Analysis . . . . . . . . . . . . 18.2.2 Multivariate Analysis of Variance . . . . . . . . . . . 18.2.3 Discriminant Analysis (Also Known as Canonical Variate Analysis) . . . . . . . . . . . . 18.2.4 Generalized Procrustes Analysis . . . . . . . . . . . 18.3 Relating Consumer and Descriptive Data Through Preference Mapping . . . . . . . . . . . . . . . . . . . . . . . 18.3.1 Internal Preference Mapping . . . . . . . . . . . . . 18.3.2 External Preference Mapping . . . . . . . . . . . . . 18.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    440 442 442 445 446

    Strategic Research . . . . . . . . . . . . . . . 19.1 Introduction . . . . . . . . . . . . . . . 19.1.1 Avenues for Strategic Research 19.1.2 Consumer Contact . . . . . . . 19.2 Competitive Surveillance . . . . . . . . 19.2.1 The Category Review . . . . . 19.2.2 Perceptual Mapping . . . . . . 19.2.3 Multivariate Methods: PCA . . 19.2.4 Multi-dimensional Scaling . .

    451 451 451 453 453 453 455 456 458

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    . . . . . . . . .

    433 433 434 434 437 438 439

    xx

    Contents

    19.2.5

    Cost-Efficient Methods for Data Collection: Sorting . . . . . . . . . . . . . . . . . . . . . . 19.2.6 Vector Projection . . . . . . . . . . . . . . . . 19.2.7 Cost-Efficient Methods for Data Collection: Projective Mapping, aka Napping . . . . . . . . 19.3 Attribute Identification and Classification . . . . . . . . . 19.3.1 Drivers of Liking . . . . . . . . . . . . . . . . 19.3.2 The Kano Model . . . . . . . . . . . . . . . . 19.4 Preference Mapping Revisited . . . . . . . . . . . . . . . 19.4.1 Types of Preference Maps . . . . . . . . . . . . 19.4.2 Preference Models: Vectors Versus Ideal Points 19.5 Consumer Segmentation . . . . . . . . . . . . . . . . . . 19.6 Claim Substantiation Revisited . . . . . . . . . . . . . . 19.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 19.7.1 Blind Testing, New Coke, and the Vienna Philharmonic . . . . . . . . . . . . . . . . . . 19.7.2 The Sensory Contribution . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . .

    459 460

    . . . . . . . . . .

    . . . . . . . . . .

    461 462 462 463 464 464 464 465 467 468

    . . . . . . . . .

    468 469 469

    . . . . . . . . . . . .

    . . . . . . . . . . . .

    . . . . . . . . . . . .

    473 473 474 475 476 478 478 478 479 480 482 482

    . . . . .

    . . . . .

    . . . . .

    484 485 485 486 487

    . . . . . . . .

    . . . . . . . .

    . . . . . . . .

    489 489 490 493 493 494 494 495

    . . . . .

    496

    Appendix A Basic Statistical Concepts for Sensory Evaluation . . A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . A.2 Basic Statistical Concepts . . . . . . . . . . . . . . . . . A.2.1 Data Description . . . . . . . . . . . . . . . . A.2.2 Population Statistics . . . . . . . . . . . . . . . A.3 Hypothesis Testing and Statistical Inference . . . . . . . A.3.1 The Confidence Interval . . . . . . . . . . . . . A.3.2 Hypothesis Testing . . . . . . . . . . . . . . . A.3.3 A Worked Example . . . . . . . . . . . . . . . A.3.4 A Few More Important Concepts . . . . . . . . A.3.5 Decision Errors . . . . . . . . . . . . . . . . . A.4 Variations of the t-Test . . . . . . . . . . . . . . . . . . . A.4.1 The Sensitivity of the Dependent t-Test for Sensory Data . . . . . . . . . . . . . . . . . . A.5 Summary: Statistical Hypothesis Testing . . . . . . . . . A.6 Postscript: What p-Values Signify and What They Do Not A.7 Statistical Glossary . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix B Nonparametric and Binomial-Based Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1 Introduction to Nonparametric Tests . . . . . . . . . B.2 Binomial-Based Tests on Proportions . . . . . . . . . B.3 Chi-Square . . . . . . . . . . . . . . . . . . . . . . . B.3.1 A Measure of Relatedness of Two Variables B.3.2 Calculations . . . . . . . . . . . . . . . . . B.3.3 Related Samples: The McNemar Test . . . . B.3.4 The Stuart-Maxwell Test . . . . . . . . . . B.3.5 Beta-Binomial, Chance-Corrected Beta-Binomial, and Dirichlet Multinomial Analyses . . . . . . . . . . . .

    . . . . . . . .

    . . . . . . . .

    . . . . . . . . . .

    Contents

    xxi

    B.4

    Useful Rank Order Tests . . . . . . . . . . . . . . . . B.4.1 The Sign Test . . . . . . . . . . . . . . . . B.4.2 The Mann-Whitney U-Test . . . . . . . . . B.4.3 Ranked Data with More Than Two Samples, Friedman and Kramer Tests . . . . . . . . . B.4.4 Rank Order Correlation . . . . . . . . . . . B.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . B.6 Postscript . . . . . . . . . . . . . . . . . . . . . . . . B.6.1 Proof showing equivalence of binomial approximation Z-test and χ 2 test for difference of proportions . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . .

    499 499 500

    . . . .

    . . . .

    501 502 503 503

    . . . . . . . . . .

    503 504

    . . . .

    Appendix C Analysis of Variance . . . . . . . . . . . . . . . . . . . C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . C.1.1 Overview . . . . . . . . . . . . . . . . . . . . C.1.2 Basic Analysis of Variance . . . . . . . . . . . C.1.3 Rationale . . . . . . . . . . . . . . . . . . . . C.1.4 Calculations . . . . . . . . . . . . . . . . . . . C.1.5 A Worked Example . . . . . . . . . . . . . . . C.2 Analysis of Variance from Complete Block Designs . . . C.2.1 Concepts and Partitioning Panelist Variance from Error . . . . . . . . . . . . . . . . . . . . C.2.2 The Value of Using Panelists As Their Own Controls . . . . . . . . . . . . . C.3 Planned Comparisons Between Means Following ANOVA C.4 Multiple Factor Analysis of Variance . . . . . . . . . . . C.4.1 An Example . . . . . . . . . . . . . . . . . . . C.4.2 Concept: A Linear Model . . . . . . . . . . . . C.4.3 A Note About Interactions . . . . . . . . . . . C.5 Panelist by Product by Replicate Designs . . . . . . . . . C.6 Issues and Concerns . . . . . . . . . . . . . . . . . . . . C.6.1 Sensory Panelists: Fixed or Random Effects? . C.6.2 A Note on Blocking . . . . . . . . . . . . . . . C.6.3 Split-Plot or Between-Groups (Nested) Designs C.6.4 Statistical Assumptions and the Repeated Measures ANOVA . . . . . . . . . . . . . . . C.6.5 Other Options . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix D Correlation, Regression, and Measures of Association D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . D.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . D.2.1 Pearson’s Correlation Coefficient Example . . . D.2.2 Coefficient of Determination . . . . . . . . . . D.3 Linear Regression . . . . . . . . . . . . . . . . . . . . . D.3.1 Analysis of Variance . . . . . . . . . . . . . . D.3.2 Analysis of Variance for Linear Regression . . D.3.3 Prediction of the Regression Line . . . . . . . . D.3.4 Linear Regression Example . . . . . . . . . . .

    . . . .

    . . . . . . . .

    . . . .

    . . . . . . . .

    . . . . . . . .

    507 507 507 508 508 509 509 510

    . . .

    510

    . . . . . . . . . . .

    . . . . . . . . . . .

    512 513 514 514 515 516 516 519 519 520 520

    . . . . . . . . .

    521 522 522

    . . . . . . . . . .

    525 525 527 528 529 529 530 530 530 531

    . . . . . . . . . . .

    . . . . . . . . . .

    . . . . . . . . . .

    xxii

    Contents

    D.4 D.5

    Multiple Linear Regression . . . . . . . . . . . . . . Other Measures of Association . . . . . . . . . . . . D.5.1 Spearman Rank Correlation . . . . . . . . . D.5.2 Spearman Correlation Coefficient Example D.5.3 Cramér’s V Measure . . . . . . . . . . . . . D.5.4 Cramér Coefficient Example . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . .

    . . . . . . .

    . . . . . . .

    . . . . . . .

    . . . . . . .

    531 531 531 532 532 533 533

    Appendix E Statistical Power and Test Sensitivity . . . . . . . . E.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . E.2 Factors Affecting the Power of Statistical Tests . . . . . E.2.1 Sample Size and Alpha Level . . . . . . . . . E.2.2 Effect Size . . . . . . . . . . . . . . . . . . . E.2.3 How Alpha, Beta, Effect Size, and N Interact E.3 Worked Examples . . . . . . . . . . . . . . . . . . . . E.3.1 The t-Test . . . . . . . . . . . . . . . . . . . E.3.2 An Equivalence Issue with Scaled Data . . . E.3.3 Sample Size for a Difference Test . . . . . . . E.4 Power in Simple Difference and Preference Tests . . . . E.5 Summary and Conclusions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . .

    . . . . . . . . . . . . .

    . . . . . . . . . . . . .

    . . . . . . . . . . . . .

    535 535 537 537 538 539 541 541 542 544 545 548 549

    Appendix F Statistical Tables . . . . . . . . . . . . . . . . . . . . . . . . . Table F.A Cumulative probabilities of the standard normal distribution. Entry area 1-α under the standard normal curve from −∞ to z(1-α) . . . . . . . . . . . . . Table F.B Table of critical values for the t-distribution . . . . . . . . Table F.C Table of critical values of the chi-square (χ 2 ) distribution . . . . . . . . . . . . . . . . . . . . . . . . . Table F.D1 Critical values of the F-distribution at α = 0.05 . . . . . . Table F.D2 Critical values of the F-distribution at α = 0.01 . . . . . . Table F.E Critical values of U for a one-tailed alpha at 0.025 or a two-tailed alpha at 0.05 . . . . . . . . . . . . . . . . Table F.F1 Table of critical values of ρ (Spearman Rank correlation coefficient) . . . . . . . . . . . . . . . . . . . Table F.F2 Table of critical values of r (Pearson’s correlation coefficient) . . . . . . . . . . . . . . . . . . . . . . . . . Table F.G Critical values for Duncan’s multiple range test (p, df, α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . Table F.H1 Critical values of the triangle test for similarity (maximum number correct as a function of the number of observations (N), beta, and proportion discriminating) . . . . . . . . . . . . . . . . . . . . . . . Table F.H2 Critical values of the duo-trio and paired comparison tests for similarity (maximum number correct as a function of the number of observations (N), beta, and proportion discriminating) . . . . . . . . . . Table F.I Table of probabilities for values as small as observed values of x associated with the binomial test (p=0.50) . . . . . . . . . . . . . . . . . . . . . . . .

    551

    552 553 554 555 556 556 557 558 559

    560

    561

    562

    Contents

    xxiii

    Table F.J

    Critical values for the differences between rank sums (α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . Critical values of the beta binomial distribution . . . . . . Minimum numbers of correct judgments to establish significance at probability levels of 5 and 1% for paired difference and duo-trio tests (one tailed, p = 1/2) and the triangle test (one tailed, p = 1/3) . . . . . . . . . . . . . . . . . . Minimum numbers of correct judgments to establish significance at probability levels of 5 and 1% for paired pference test (two tailed, p = 1/2) . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimum number of responses (n) and correct responses (x) to obtain a level of Type I and Type II risks in the triangle test. Pd is the chance-adjusted percent correct or proportion of discriminators . . . . . . . . . . . . . . . . . . . . . . Minimum number of responses (n) and correct responses (x) to obtain a level of Type I and Type II risks in the duo-trio test. Pc is the chance-adjusted percent correct or proportion of discriminators . . . . . . . . . . . . . . . . . . . . . . d and B (variance factor) values for the duo-trio and 2-AFC (paired comparison) difference tests . . . . . . d and B (variance factor) values for the triangle and 3-AFC difference tests . . . . . . . . . . . . . . . . . Random permutations of nine . . . . . . . . . . . . . . . Random numbers . . . . . . . . . . . . . . . . . . . . . .

    569 571 572

    Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    573

    Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    587

    Table F.K Table F.L

    Table F.M

    Table F.N1

    Table F.N2

    Table F.O1 Table F.O2 Table F.P Table F.Q

    563 564

    565

    566

    567

    567 568

    Chapter 1

    Introduction

    Abstract In this chapter we carefully parse the definition for sensory evaluation, briefly discuss validity of the data collected before outlining the early history of the field. We then describe the three main methods used in sensory evaluation (discrimination tests, descriptive analysis, and hedonic testing) before discussing the differences between analytical and consumer testing. We then briefly discuss why one may want to collect sensory data. In the final sections we highlight the differences and similarities between sensory evaluation and marketing research and between sensory evaluation and commodity grading as used in, for example, the dairy industry.

    Sensory evaluation is a child of industry. It was spawned in the late 40’s by the rapid growth of the consumer product companies, mainly food companies. . . . Future development in sensory evaluation will depend upon several factors, one of the most important being the people and their pparation and training. – Elaine Skinner (1989)

    Contents Introduction and Overview . . . . . . . . . 1.1.1 Definition . . . . . . . . . . . . . . 1.1.2 Measurement . . . . . . . . . . . . 1.2 Historical Landmarks and the Three Classes of Test Methods . . . . . . . . . . . . . . 1.2.1 Difference Testing . . . . . . . . . . 1.2.2 Descriptive Analyses . . . . . . . . . 1.2.3 Affective Testing . . . . . . . . . . 1.2.4 The Central Dogma-Analytic Versus Hedonic Tests . . . . . . . . . . . . 1.3 Applications: Why Collect Sensory Data? . . 1.3.1 Differences from Marketing Research Methods . . . . . . . . . . . . . . 1.3.2 Differences from Traditional Product Grading Systems . . . . . . . . . . 1.4 Summary and Conclusions . . . . . . . . . References . . . . . . . . . . . . . . . . . . . 1.1

    1.1 Introduction and Overview . . .

    1 1 3

    . . . .

    4 5 6 7

    . .

    8 10

    .

    13

    . . .

    15 16 17

    1.1.1 Definition The field of sensory evaluation grew rapidly in the second half of the twentieth century, along with the expansion of the processed food and consumer products industries. Sensory evaluation comprises a set of techniques for accurate measurement of human responses to foods and minimizes the potentially biasing effects of brand identity and other information influences on consumer perception. As such, it attempts to isolate the sensory properties of foods themselves and provides important and useful information to product developers, food scientists, and managers about the sensory characteristics of their products. The field was comphensively reviewed by Amerine, Pangborn, and Roessler in 1965, and more recent texts have been published by Moskowitz et al. (2006), Stone and Sidel (2004), and Meilgaard et al. (2006). These three later sources are practical works aimed at sensory specialists

    H.T. Lawless, H. Heymann, Sensory Evaluation of Food, Food Science Text Series, DOI 10.1007/978-1-4419-6488-5_1, © Springer Science+Business Media, LLC 2010

    1

    2

    in industry and reflect the philosophies of the consulting groups of the authors. Our goal in this book is to provide a comphensive overview of the field with a balanced view based on research findings and one that is suited to students and practitioners alike. Sensory evaluation has been defined as a scientific method used to evoke, measure, analyze, and interpt those responses to products as perceived through the senses of sight, smell, touch, taste, and hearing (Stone and Sidel, 2004). This definition has been accepted and endorsed by sensory evaluation committees within various professional organizations such as the Institute of Food Technologists and the American Society for Testing and Materials. The principles and practices of sensory evaluation involve each of the four activities mentioned in this definition. Consider the words “to evoke.” Sensory evaluation gives guidelines for the pparation and serving of samples under controlled conditions so that biasing factors are minimized. For example, people in a sensory test are often placed in inpidual test booths so that the judgments they give are their own and do not reflect the opinions of those around them. Samples are labeled with random numbers so that people do not form judgments based upon labels, but rather on their sensory experiences. Another example is in how products may be given in different orders to each participant to help measure and counterbalance for the sequential effects of seeing one product after another. Standard procedures may be established for sample temperature, volume, and spacing in time, as needed to control unwanted variation and improve test pcision. Next, consider the words, “to measure.” Sensory evaluation is a quantitative science in which numerical data are collected to establish lawful and specific relationships between product characteristics and human perception. Sensory methods draw heavily from the techniques of behavioral research in observing and quantifying human responses. For example, we can assess the proportion of times people are able to discriminate small product changes or the proportion of a group that expsses a pference for one product over another. Another example is having people generate numerical responses reflecting their perception of how strong a product may taste or smell. Techniques of behavioral research and experimental psychology offer guidelines as to how such measurement techniques should be employed and what their potential pitfalls and liabilities may be.

    1 Introduction

    The third process in sensory evaluation is analysis. Proper analysis of the data is a critical part of sensory testing. Data generated from human observers are often highly variable. There are many sources of variation in human responses that cannot be completely controlled in a sensory test. Examples include the mood and motivation of the participants, their innate physiological sensitivity to sensory stimulation, and their past history and familiarity with similar products. While some screening may occur for these factors, they may be only partially controlled, and panels of humans are by their nature heterogeneous instruments for the generation of data. In order to assess whether the relationships observed between product characteristics and sensory responses are likely to be real, and not merely the result of uncontrolled variation in responses, the methods of statistics are used to analyze evaluation data. Hand-in-hand with using appropriate statistical analyses is the concern of using good experimental design, so that the variables of interest are investigated in a way that allows sensible conclusions to be drawn. The fourth process in sensory evaluation is the interptation of results. A sensory evaluation exercise is necessarily an experiment. In experiments, data and statistical information are only useful when interpted in the context of hypotheses, background knowledge, and implications for decisions and actions to be taken. Conclusions must be drawn that are reasoned judgments based upon data, analyses, and results. Conclusions involve consideration of the method, the limitations of the experiment, and the background and contextual framework of the study. The sensory evaluation specialists become more than mere conduits for experimental results, but must contribute interptations and suggest reasonable courses of action in light of the numbers. They should be full partners with their clients, the end-users of the test results, in guiding further research. The sensory evaluation professional is in the best situation to realize the appropriate interptation of test results and the implications for the perception of products by the wider group of consumers to whom the results may be generalized. The sensory specialist best understands the limitations of the test procedure and what its risks and liabilities may be. A sensory scientist who is ppared for a career in research must be trained in all four of the phases mentioned in the definition. They must understand products, people as measuring instruments, statistical

    1.1 Introduction and Overview

    1.1.2 Measurement Sensory evaluation is a science of measurement. Like other analytical test procedures, sensory evaluation is concerned with pcision, accuracy, sensitivity, and avoiding false positive results (Meiselman, 1993). Precision is similar to the concept in the behavioral sciences of reliability. In any test procedure, we would like to be able to get the same result when a test is repeated. There is usually some error variance around an obtained value, so that upon repeat testing, the value will not always be exactly the same. This is especially true of sensory tests in which human perceptions are necessarily part of the generation of data. However, in many sensory test procedures, it is desirable to minimize this error variance as much as possible and to have tests that are low in error associated with repeated measurements. This is achieved by several means. As noted above, we isolate the sensory response to the factors of interest, minimizing extraneous influences, controlling sample pparation and psentation. Additionally, as necessary, sensory scientists screen and train panel participants. A second concern is the accuracy of a test. In the physical sciences, this is viewed as the ability of a test instrument to produce a value that is close to the “true” value, as defined by independent measurement from another instrument or set of instruments that have been appropriately calibrated. A related idea in the behavioral sciences, this principle is called the validity of a test. This concerns the ability of a test procedure to measure what it was designed and intended to measure. Validity is established in a number of ways. One useful criterion is pdictive validity, when a test result is of value in pdicting what would occur in another situation or another measurement. In sensory testing, for example, the test results should reflect the perceptions and opinions of consumers that might buy the product. In other words, the results of the sensory test should generalize to the larger population. The test results might correlate with instrumental measures, process or ingredient variables, storage factors, shelf life times,

    3

    or other conditions known to affect sensory properties. In considering validity, we have to look at the end use of the information provided by a test. A sensory test method might be valid for some purposes, but not others (Meiselman, 1993). A simple difference test can tell if a product has changed, but not whether people will like the new version. A good sensory test will minimize errors in measurement and errors in conclusions and decisions. There are different types of errors that may occur in any test procedure. Whether the test result reflects the true state of the world is an important question, especially when error and uncontrolled variability are inherent in the measurement process. Of primary concern in sensory tests is the sensitivity of the test to differences among products. Another way to phrase this is that a test should not often miss important differences that are psent. “Missing a difference” implies an insensitive test procedure. To keep sensitivity high, we must minimize error variance wherever possible by careful experimental controls and by selection and training of panelists where appropriate. The test must involve sufficient numbers of measurements to insure a tight and reliable statistical estimate of the values we obtain, such as means or proportions. In statistical language, detecting true differences is avoiding Type II error and the minimization of β-risk. Discussion of the power and sensitivity of tests from a statistical perspective occurs in Chapter 5 and in the Appendix. The other error that may occur in a test result is that of finding a positive result when none is actually psent in the larger population of people and products outside the sensory test. Once again, a positive result usually means detection of a statistically significant difference between test products. It is important to use a test method that avoids false positive results or Type I error in statistical language. Basic statistical training and common statistical tests applied to scientific findings are oriented toward avoiding this kind of error. The effects of random chance deviations must be taken into account in deciding if a test result reflects a real difference or whether our result is likely to be due to chance variation. The common procedures of inferential statistics provide assurance that we have limited our possibility of finding a difference where one does not really exist. Statistical procedures reduce this risk to some comfortable level, usually with a ceiling of 5% of all tests we conduct.

    4

    1 Introduction

    (Lawless and Klein, 1989). An important function of sensory scientists in an academic setting is to provide consulting and resources to insure that quality tests are conducted by other researchers and students who seek to understand the sensory impact of the variables they are studying. In government services such as food inspection, sensory evaluation plays a key role (York, 1995). Sensory principles and appropriate training can be key in insuring that test methods reflect the current knowledge of sensory function and test design. See Lawless (1993) for an overview of the education and training of sensory scientists-much of this piece still rings true more than 15 years later.

    1.2 Historical Landmarks and the Three Classes of Test Methods The human senses have been used for centuries to evaluate the quality of foods. We all form judgments about foods whenever we eat or drink (“Everyone carries his own inch-rule of taste, and amuses himself by applying it, triumphantly, wherever he travels.”-Henry Adams, 1918). This does not mean that all judgments are useful or that anyone is qualified to participate in a sensory test. In the past, production of good quality foods often depended upon the sensory acuity of a single expert who was in charge of production or made decisions about process changes in order to make sure the product would have desirable characteristics. This was the historical tradition of brewmasters, wine tasters, dairy judges, and other food inspectors who acted as the arbiters of quality. Modern sensory evaluation replaced these single authorities with panels of people participating in specific test methods that took the form of planned experiments. This change occurred for several reasons. First, it was recognized that the judgments of a panel would in general be more reliable than the judgments of single inpidual and it entailed less risk since the single expert could become ill, travel, retire, die, or be otherwise unavailable to make decisions. Replacement of such an inpidual was a nontrivial problem. Second, the expert might or might not reflect what consumers or segments of the consuming public might want in a product. Thus for issues of product quality and overall appeal, it was safer (although often more time consuming and expensive)

    1.2 Historical Landmarks and the Three Classes of Test Methods

    to go directly to the target population. Although the tradition of informal, qualitative inspections such as benchtop “cuttings” persists in some industries, they have been gradually replaced by more formal, quantitative, and controlled observations (Stone and Sidel, 2004). The current sensory evaluation methods comprise a set of measurement techniques with established track records of use in industry and academic research. Much of what we consider standard procedures comes from pitfalls and problems encountered in the practical experience of sensory specialists over the last 70 years of food and consumer product research, and this experience is considerable. The primary concern of any sensory evaluation specialist is to insure that the test method is appropriate to answer the questions being asked about the product in the test. For this reason, tests are usually classified according to their primary purpose and most valid use. Three types of sensory testing are commonly used, each with a different goal and each using participants selected using different criteria. A summary of the three main types of testing is given in Table 1.1.

    1.2.1 Difference Testing The simplest sensory tests merely attempt to answer whether any perceptible difference exists between two types of products. These are the discrimination tests or simple difference testing procedures. Analysis is usually based on the statistics of frequencies and proportions (counting right and wrong answers). From the test results, we infer differences based on the proportions of persons who are able to choose a test product correctly from among a set of similar or control products. A classic example of this test was the triangle procedure, used in the Carlsberg breweries and in the Seagrams distilleries in the 1940s (Helm and Trolle,

    5

    1946; Peryam and Swartz, 1950). In this test, two products were from the same batch while a third product was different. Judges would be asked to pick the odd sample from among the three. Ability to discriminate differences would be inferred from consistent correct choices above the level expected by chance. In breweries, this test served primarily as a means to screen judges for beer evaluation, to insure that they possessed sufficient discrimination abilities. Another multiple-choice difference test was developed at about the same time in distilleries for purposes of quality control (Peryam and Swartz, 1950). In the duo-trio procedure, a reference sample was given and then two test samples. One of the test samples matched the reference while the other was from a different product, batch or process. The participant would try to match the correct sample to the reference, with a chance probability of one-half. As in the triangle test, a proportion of correct choices above that expected by chance is considered evidence for a perceivable difference between products. A third popular difference test was the paired comparison, in which participants would be asked to choose which of two products was stronger or more intense in a given attribute. Partly due to the fact that the panelist’s attention is directed to a specific attribute, this test is very sensitive to differences. These three common difference tests are shown in Fig. 1.1. Simple difference tests have proven very useful in application and are in widespad use today. Typically a discrimination test will be conducted with 25-40 participants who have been screened for their sensory acuity to common product differences and who are familiar with the test procedures. This generally provides an adequate sample size for documenting clear sensory differences. Often a replicate test is performed while the respondents are psent in the sensory test facility. In part, the popularity of these tests is due to the simplicity of data analysis. Statistical tables derived from the binomial distribution give the minimum number of correct responses needed to conclude statistical

    Table 1.1 Classification of test methods in sensory evaluation Class Question of interest

    Type of test

    Panelist characteristics

    Discrimination

    Are products perceptibly different in any way

    “Analytic”

    Descriptive

    How do products differ in specific sensory characteristics How well are products liked or which products are pferred

    “Analytic”

    Screened for sensory acuity, oriented to test method, sometimes trained Screened for sensory acuity and motivation, trained or highly trained Screened for products, untrained

    Affective

    “Hedonic”

    6

    1 Introduction

    Fig. 1.1 Common methods for discrimination testing include the triangle, duo-trio, and paired comparison procedures.

    significance as a function of the number of participants. Thus a sensory technician merely needs to count answers and refer to a table to give a simple statistical conclusion, and results can be easily and quickly reported.

    the consensus of a panel was likely to be more reliable and accurate than the judgment of a single inpidual. Second, it provided a means to characterize the inpidual attributes of flavor and provide a comphensive analytical description of differences among a group of products under development. Several variations and refinements in descriptive analysis techniques were forthcoming. A group at the General Foods Technical Center in the early 1960s developed and refined a method to quantify food texture, much as the flavor profile had enabled the quantification of flavor properties (Brandt et al., 1963, Szczesniak et al., 1975). This technique, the Texture Profile method, used a fixed set of force-related and shape-related attributes to characterize the rheological and tactile properties of foods and how these changed over time with mastication. These characteristics have parallels in the physical evaluation of food breakdown or flow. For example, perceived hardness is related to the physical force required to penetrate a sample. Perceived thickness of a fluid or semisolid is related in part to physical viscosity. Texture profile panelists were also trained to recognize specific intensity points along each scale, using standard products or formulated pseudo-foods for calibration. Other approaches were developed for descriptive analysis problems. At Stanford Research Institute in the early 1970s, a group proposed a method for descriptive analysis that would remedy some of the R apparent shortcomings of the Flavor Profile method

    1.2 Historical Landmarks and the Three Classes of Test Methods

    Table 1.2 Descriptive evaluation of cookies-texture attributes Phase Attributes Word anchors Surface

    First bite

    First chew Chew down

    Residual

    Roughness Particles Dryness Fracturability Hardness Particle size Denseness Uniformity of chew Moisture absorption Cohesiveness of mass Toothpacking Grittiness Oiliness Particles Chalky

    Smooth-rough None-many Oily-dry Crumbly-brittle Soft-hard Small-large Airy-dense Even-uneven None-much Loose-cohesive None-much None-much Dry-oily None-many Not chalky-very chalky

    7

    a uniform and controlled manner, typical of an analytical sensory test procedure. For example, the first bite may be defined as cutting with the incisors. The panel for such an analysis would consist of perhaps 10- 12 well-trained inpiduals, who were oriented to the meanings of the terms and given practice with examples. Intensity references to exemplify scale points are also given in some techniques. Note the amount of detailed information that can be provided in this example and bear in mind that this is only looking at the product’s texture-flavor might form an equally detailed sensory analysis, perhaps with a separate trained panel. The relatively small number of panelists (a dozen or so) is justified due to their level of calibration. Since they have been trained to use attribute scales in a similar manner, error variance is lowered and statistical power and test sensitivity are maintained in spite of fewer observations (fewer data points per product). Similar examples of texture, flavor, fragrance, and tactile analyses can be found in Meilgaard et al. (2006).

    8

    1 Introduction

    the food product (the stimulus). However, since a sensory perception involves the necessary interaction of a person with a stimulus, it should be apparent that similar test methods are necessary to characterize this person-product interaction.

    1.2.4 The Central Dogma-Analytic Versus Hedonic Tests

    Fig. 1.2 The 9-point hedonic scale used to assess liking and disliking. This scale, originally developed at the U.S. Army Food and Container Institute (Quartermaster Corps), has achieved widespad use in consumer testing of foods.

    The central principle for all sensory evaluation is that the test method should be matched to the objectives of the test. Figure 1.3 shows how the selection of the test procedure flows from questions about the objective of the investigation. To fulfill this goal, it is necessary to have clear communication between the sensory test manager and the client or end-user of the information. A dialogue is often needed. Is the important question whether or not there is any difference at all among the products? If so, a discrimination test is indicated. Is the question one of whether consumers like the new product better than the pvious version? A consumer acceptance test is needed. Do we need to know what attributes have changed in the sensory characteristics of the new product? Then a descriptive analysis procedure is called for. Sometimes there are multiple objectives and a sequence of different tests is required (Lawless and Claassen, 1993). This can psent problems if all the answers are required at once or under severe time pssure during competitive product development. One of the most important jobs of the sensory specialist in the food industry is to insure a clear understanding and specification of the type of information needed by the end-users. Test design may require a number of conversations, interviews with different people, or even written test requests that specify why the information is to be collected and how the results will be used in making specific decisions and subsequent actions to be taken. The sensory specialist is the best position to understand the uses and limitations of each procedure and what would be considered appropriate versus inappropriate conclusions from the data. There are two important corollaries to this principle. The sensory test design involves not only the selection of an appropriate method but also the selection of appropriate participants and statistical analyses. The three classes of sensory tests can be pided into two types, analytical sensory tests including discrimination

    1.2 Historical Landmarks and the Three Classes of Test Methods

    9

    Fig. 1.3 A flowchart showing methods determination. Based on the major objectives and research questions, different sensory test methods are selected. Similar decision processes are made in panelist selection, setting up response scales, in choosing experimental designs, statistical analysis, and other tasks in designing a sensory test (reprinted with permission from Lawless, 1993).

    and descriptive methods and affective or hedonic tests such as those involved in assessing consumer liking or pferences (Lawless and Claassen, 1993). For the analytical tests, panelists are selected based on having average to good sensory acuity for the critical characteristics (tastes, smells, textures, etc.) of products to be evaluated. They are familiarized with the test procedures and may undergo greater or lesser amounts of training, depending upon the method. In the case of descriptive analysis, they adopt an analytical frame of mind, focusing on specific aspects of the product as directed by the scales on their questionnaires. They are asked to put personal pferences and hedonic reactions aside, as their job is only to specify what attributes are psent in the product and at what levels of sensory intensity, extent, amount, or duration. In contrast to this analytical frame of mind, consumers in an affective test act in a much more integrative fashion. They perceive a product as a whole pattern. Although their attention is sometimes captured by a specific aspect of a product (especially if it is a bad, unexpected, or unpleasant one), their reactions to the product are often immediate and based on the integrated pattern of sensory stimulation from the product and expssed as liking or disliking. This occurs without a great deal of thought or dissection of the product’s specific profile. In other words, consumers are effective at rendering impssions based on the integrated pattern of perceptions. In such consumer

    tests, participants must be chosen carefully to insure that the results will generalize to the population of interest. Participants should be frequent users of the product, since they are most likely to form the target market and will be familiar with similar products. They possess reasonable expectations and a frame of reference within which they can form an opinion relative to other similar products they have tried. The analytic/hedonic distinction gives rise to some highly important rules of thumb and some warnings about matching test methods and respondents. It is unwise to ask trained panelists about their pferences or whether they like or dislike a product. They have been asked to assume a different, more analytical frame of mind and to place personal pference aside. Furthermore, they have not necessarily been selected to be frequent users of the product, so they are not part of the target population to which one would like to generalize hedonic test results. A common analogy here is to an analytical instrument. You would not ask a gas chromatograph or a pH meter whether it liked the product, so why ask your analytical descriptive panel (O’Mahony, 1979). Conversely, problems arise when consumers are asked to furnish very specific information about product attributes. Consumers not only act in a non-analytic frame of mind but also often have very fuzzy concepts about specific attributes, confusing sour and bitter tastes, for example. Inpiduals often differ markedly

    10

    in their interptations of sensory attribute words on a questionnaire. While a trained texture profile panel has no trouble in agreeing how cohesive a product is after chewing, we cannot expect consumers to provide pcise information on such a specific and technical attribute. In summary, we avoid using trained panelists for affective information and we avoid asking consumers about specific analytical attributes. Related to the analytic-hedonic distinction is the question of whether experimental control and pcision are to be maximized or whether validity and generalizability to the real world are more important. Often there is a tradeoff between the two and it is difficult to maximize both simultaneously. Analytic tests in the lab with specially screened and trained judges are more reliable and lower in random error than consumer tests. However, we give up a certain amount of generalizability to real-world results by using artificial conditions and a special group of participants. Conversely, in the testing of products by consumers in their own homes we have not only a lot of real-life validity but also a lot of noise in the data. Brinberg and McGrath (1985) have termed this struggle between pcision and validity one of “conflicting desiderata.” O’Mahony (1988) has made a distinction between sensory evaluation Type I and Type II. In Type I sensory evaluation, reliability and sensitivity are key factors, and the participant is viewed much like an analytical instrument used to detect and measure changes in a food product. In Type II sensory evaluation, participants are chosen to be repsentative of the consuming population, and they may evaluate food under more naturalistic conditions. Their emphasis here is on pdiction of consumer response. Every sensory test falls somewhere along a continuum where reliability versus real-life extrapolation are in a potential tradeoff relationship. This factor must also be discussed with end-users of the data to see where their emphasis lies and what level of tradeoff they find comfortable. Statistical analyses must also be chosen with an eye to the nature of the data. Discrimination tests involve choices and counting numbers of correct responses. The statistics derived from the binomial distribution or those designed for proportions such as chi-square are appropriate. Conversely, for most scaled data, we can apply the familiar parametric statistics appropriate to normally distributed and continuous data, such as means, standard deviations, t-tests, analysis of variance. The choice of an appropriate statistical test is not

    1 Introduction

    always straightforward, so sensory specialists are wise to have thorough training in statistics and to involve statistical and design specialists in a complex project in its earliest stages of planning. Occasionally, these central principles are violated. They should not be put aside as a matter of mere expediency or cost savings and never without a logical analysis. One common example is the use of a discrimination test before consumer acceptance. Although our ultimate interest may lie in whether consumers will like or dislike a new product variation, we can conduct a simple difference test to see whether any change is perceivable at all. The logic in this sequence is the following: if a screened and experienced discrimination panel cannot tell the difference under carefully controlled conditions in the sensory laboratory, then a more heterogeneous group of consumers is unlikely to see a difference in their less controlled and more variable world. If no difference is perceived, there can logically be no systematic pference. So a more time consuming and costly consumer test can sometimes be avoided by conducting a simpler but more sensitive discrimination test first. The added reliability of the controlled discrimination test provides a “safety net” for conclusions about consumer perception. Of course, this logic is not without its pitfalls-some consumers may interact extensively with the product during a home use test period and may form stable and important opinions that are not captured in a short duration laboratory test, and there is also always the possibility of a false negative result (the error of missing a difference). MacRae and Geelhoed (1992) describe an interesting case of a missed difference in a triangle test where a significant pference was then observed between water samples in a paired comparison. The sensory professional must be aware that these anomalies in experimental results will sometimes arise, and must also be aware of some of the reasons why they occur.

    1.3 Applications: Why Collect Sensory Data? Human perceptions of foods and consumer products are the results of complex sensory and interptation processes. At this stage in scientific history, perceptions of such multidimensional stimuli as conducted

    1.3 Applications: Why Collect Sensory Data?

    by the parallel processing of the human nervous system are difficult or impossible to pdict from instrumental measures. In many cases instruments lack the sensitivity of human sensory systems-smell is a good example. Instruments rarely mimic the mechanical manipulation of foods when tasted nor do they mimic the types of peri-receptor filtering that occur in biological fluids like saliva or mucus that can cause chemical partitioning of flavor materials. Most importantly, instrumental assessments give values that miss an important perceptual process: the interptation of sensory experience by the human brain prior to responding. The brain lies interposed between sensory input and the generation of responses that form our data. It is a massively parallel-distributed processor and computational engine, capable of rapid feats of pattern recognition. It comes to the sensory evaluation task complete with a personal history and experiential frame of reference. Sensory experience is interpted, given meaning within the frame of reference, evaluated relative to expectations and can involve integration of multiple simultaneous or sequential inputs. Finally judgments are rendered as our data. Thus there is a “chain of perception” rather than simply stimulus and response (Meilgaard et al., 2006). Only human sensory data provide the best models for how consumers are likely to perceive and react to food products in real life. We collect, analyze, and interpt sensory data to form pdictions about how products have changed during a product development program. In the food and consumer products industries, these changes arise from three important factors: ingredients, processes, and packaging. A fourth consideration is often the way a product ages, in other words its shelf life, but we may consider shelf stability to be one special case of processing, albeit usually a very passive one (but also consider products exposed to temperature fluctuation, light-catalyzed oxidation, microbial contamination, and other “abuses”). Ingredient changes arise for a number of reasons. They may be introduced to improve product quality, to reduce costs of production, or simply because a certain supply of raw materials has become unavailable. Processing changes likewise arise from the attempt to improve quality in terms of sensory, nutritional, microbiological stability factors, to reduce costs or to improve manufacturing productivity. Packaging changes arise from considerations of product stability or other quality factors, e.g., a certain

    11

    amount of oxygen permeability may insure that a fresh beef product remains red in color for improved visual appeal to consumers. Packages function as carriers of product information and brand image, so both sensory characteristics and expectations can change as a function of how this information can be carried and displayed by the packaging material and its print overlay. Packaging and print ink may cause changes in flavor or aroma due to flavor transfer out of the product and sometimes transfer of off-flavors into the product. The package also serves as an important barrier to oxidative changes, to the potentially deleterious effects of light-catalyzed reactions, and to microbial infestations and other nuisances. The sensory test is conducted to study how these product manipulations will create perceived changes to human observers. In this sense, sensory evaluation is in the best traditions of psychophysics, the oldest branch of scientific psychology, that attempts to specify the relationships between different energy levels impinging upon the sensory organs (the physical part of psychophysics) and the human response (the psychological part). Often, one cannot pdict exactly what the sensory change will be as a function of ingredients, processes, or packaging, or it is very difficult to do so since foods and consumer products are usually quite complex systems. Flavors and aromas depend upon complex mixtures of many volatile chemicals. Informal tasting in the lab may not bring a reliable or sufficient answer to sensory questions. The benchtop in the development laboratory is a poor place to judge potential sensory impact with distractions, competing odors, nonstandard lighting, and so on. Finally, the nose, eyes, and tongue of the product developer may not be repsentative of most other people who will buy the product. So there is some uncertainty about how consumers will view a product especially under more natural conditions. Uncertainty is the key here. If the outcome of a sensory test is perfectly known and pdictable, there is no need to conduct the formal evaluation. Unfortunately, useless tests are often requested of a sensory testing group in the industrial setting. The burden of useless routine tests arises from overly entrenched product development sequences, corporate traditions, or merely the desire to protect oneself from blame in the case of unexpected failures. However, the sensory test is only as useful as the amount of reduction in uncertainty that occurs. If there is no uncertainty, there

    12

    is no need for the sensory test. For example, doing a sensory test to see if there is a perceptible color difference between a commercial red wine and a commercial white wine is a waste of resources, since there is no uncertainty! In the industrial setting, sensory evaluation provides a conduit for information that is useful in management business decisions about directions for product development and product changes. These decisions are based on lower uncertainty and lower risk once the sensory information is provided. Sensory evaluation also functions for other purposes. It may be quite useful or even necessary to include sensory analyses in quality control (QC) or quality assurance. Modification of traditional sensory practices may be required to accommodate the small panels and rapid assessments often required in online QC in the manufacturing environment. Due to the time needed to assemble a panel, ppare samples for testing, analyze and report sensory data, it can be quite challenging to apply sensory techniques to quality control as an on-line assessment. Quality assurance involving sensory assessments of finished products are more readily amenable to sensory testing and may be integrated with routine programs for shelf life assessment or quality monitoring. Often it is desirable to establish correlations between sensory response and instrumental measures. If this is done well, the instrumental measure can sometimes be substituted for the sensory test. This is especially applicable under conditions in which rapid turnaround is needed. Substitution of instrumental measurements for sensory data may also be useful if the evaluations are likely to be fatiguing to the senses, repetitive, involve risk in repeated evaluations (e.g., insecticide fragrances), and are not high in business risk if unexpected sensory problems arise that were missed. In addition to these product-focused areas of testing, sensory research is valuable in a broader context. A sensory test may help to understand the attributes of a product that consumers view as critical to product acceptance and thus success. While we keep a wary eye on the fuzzy way that consumers use language, consumer sensory tests can provide diagnostic information about a product’s points of superiority or shortcomings. Consumer sensory evaluations may suggest hypotheses for further inquiry such as exploration of new product opportunities. There are recurrent themes and enduring problems in sensory science. In 1989, the ASTM Committee

    1 Introduction

    1.3 Applications: Why Collect Sensory Data?

    (4)

    (5)

    (6)

    (7)

    control in multi-plant manufacturing situations, as well as international product development and the problem of multiple sensory testing sites and panels. Environmental and biochemical factors. Kamen recognized that pferences may change as a function of the situation (food often tastes better outdoors and when you are hungry). Meiselman (1993) questioned whether sufficient sensory research is being performed in realistic eating situations that may be more pdictive of consumer reactions, and recently sensory scientists have started to explore this area of research (for example, Giboreau and Fleury, 2009; Hein et al., 2009; Mielby and Frøst, 2009). Resolving discrepancies between laboratory and field studies. In the search for reliable, detailed, and pcise analytical methods in the sensory laboratory, some accuracy in pdicting field test results may be lost. Management must be aware of the potential of false positive or negative results if a full testing sequence is not carried out, i.e., if shortcuts are made in the testing sequence prior to marketing a new product. Sensory evaluation specialists in industry do not always have time to study the level of correlation between laboratory and field tests, but a prudent sensory program would include periodic checks on this issue. Inpidual differences. Since Kamen’s era, a growing literature has illuminated the fact that human panelists are not identical, interchangeable measuring instruments. Each comes with different physiological equipment, different frames of reference, different abilities to focus and maintain attention, and different motivational resources. As an example of differences in physiology, we have the growing literature on specific anosmias-smell “blindnesses” to specific chemical compounds among persons with otherwise normal senses of smell (Boyle et al., 2006; Plotto et al., 2006; Wysocki and Labows, 1984). It should not be surprising that some olfactory characteristics are difficult for even trained panelists to evaluate and to come to agreement (Bett and Johnson, 1996). Relating sensory differences to product variables. This is certainly the meat of sensory science in industrial practice. However, many product developers do not sufficiently involve their sensory specialists in the underlying research questions.

    13

    They also may fall into the trap of never ending sequences of paired tests, with little or no planned designs and no modeling of how underlying physical variables (ingredients, processes) create a dynamic range of sensory changes. The relation of graded physical changes to sensory response is the essence of psychophysical thinking. (8) Sensory interactions. Foods and consumer products are multidimensional. The more sensory scientists understand interactions among characteristics such as enhancement and masking effects, the better they can interpt the results of sensory tests and provide informed judgments and reasoned conclusions in addition to reporting just numbers and statistical significance. (9) Sensory education. End-users of sensory data and people who request sensory tests often expect one tool to answer all questions. Kamen cited the simple dichotomy between analytical and hedonic testing (e.g., discrimination versus pference) and how explaining this difference was a constant task. Due to the lack of widespad training in sensory science, the task of sensory education is still with us today, and a sensory professional must be able to explain the rationale behind test methods and communicate the importance and logic of sensory technology to non-sensory scientists and managers.

    14 Table 1.3 Contrast of sensory evaluation consumer tests with market research tests

    1 Introduction Sensory testing with consumers Participants screened to be users of the product category Blind-labeled samples-random codes with minimal conceptual information Determines if sensory properties and overall appeal met targets Expectations based on similar products used in the category Not intended to assess response/appeal of product concept Market research testing (concept and/or product test) Participants in product-testing phase selected for positive response to concept Conceptual claims, information, and frame of reference are explicit Expectations derived from concept/claims and similar product usage Unable to measure sensory appeal in isolation from concept and expectations

    designed to make the product conceptually appealing (e.g., bringing attention to convenience factors in pparation). In a sensory test all these potentially biasing factors are stripped away in order to isolate the opinion based on sensory properties only. In the tradition of scientific inquiry, we need to isolate the variables of interest (ingredients, processing, packaging changes) and assess sensory properties as a function of these variables, and not as a function of conceptual influences. This is done to minimize the influence of a larger cognitive load of expectations generated from complex conceptual information. There are many potential response biases and task demands that are entailed in “selling” an idea as well as in selling a product. Participants often like to please the experimenter and give results consistent with what they think the person wants. There is a large literature on the effect of factors such as brand label on consumer response. Product information interacts in complex ways with consumer attitudes and expectancies (Aaron et al., 1994; Barrios and Costell, 2004; Cardello and Sawyer, 1992; Costell et al., 2009; Deliza and MacFie, 1996; Giménez et al., 2008; Kimura et al., 2008; Mielby and Frøst, 2009; Park and Lee, 2003; Shepherd et al., 1991/1992). Expectations can cause assimilation of sensory reactions toward what is expected under some conditions and under other conditions will show contrast effects, enhancing differences when expectations are not met (Siegrist and Cousin, 2009; Lee et al., 2006; Yeomans et al., 2008; Zellner et al., 2004). Packaging and brand information will also affect sensory judgments (Dantas et al., 2004; Deliza et al., 1999; Enneking et al., 2007). So the apparent resemblance of a blind sensory test and a fully concept-loaded market research test are quite illusory. Corporate management needs to be reminded of this important distinction. There continues to be

    1.3 Applications: Why Collect Sensory Data?

    1.3.2 Differences from Traditional Product Grading Systems A second arena of apparent similarity to sensory evaluation is with the traditional product quality grading systems that use sensory criteria. The grading of agricultural commodities is a historically important influence on the movement to assure consumers of quality standards in the foods they purchase. Such techniques were widely applicable to simple products such as fluid milk and butter (Bodyfelt et al., 1988, 2008), where an ideal product could be largely agreed upon and the defects that could arise in poor handling and processing gave rise to well-known sensory effects. Further impetus came from the fact that competitions could be held to examine whether novice judges-in-training could match the opinions of experts. This is much in the tradition of livestock grading-a young person could judge a cow and receive awards at a state fair for learning to use the same criteria and critical eye as the expert judges. There are noteworthy differences in the ways in which sensory testing and quality judging are performed. Some of these are outlined in Table 1.4. The commodity grading and the inspection tradition have severe limitations in the current era of highly processed foods and market segmentation. There are fewer and fewer “standard products” relative to the wide variation in flavors, nutrient levels (e.g., low fat), convenience pparations, and other choices that

    15

    line the supermarket shelves. Also, one person’s product defect may be another’s marketing bonanza, as in the glue that did not work so well that gave us the ubiquitous post-it notes. Quality judging methods are poorly suited to research support programs. The techniques have been widely criticized on a number of scientific grounds (Claassen and Lawless, 1992; Drake, 2007; O’Mahony, 1979; Pangborn and Dunkley, 1964; Sidel et al., 1981), although they still have their proponents in industry and agriculture (Bodyfelt et al., 1988, 2008). The defect identification in quality grading emphasizes root causes (e.g., oxidized flavor) whereas the descriptive approach uses more elemental singular terms to describe perceptions rather than to infer causes. In the case of oxidized flavors, the descriptive analysis panel might use a number of terms (oily, painty, and fishy) since oxidation causes a number of qualitatively different sensory effects. Another notable difference from mainstream sensory evaluation is that the quality judgments combine an overall quality scale (psumably reflecting consumer dislikes) with diagnostic information about defects, a kind of descriptive analysis looking only at the negative aspects of products. In mainstream sensory evaluation, the descriptive function and the consumer evaluation would be clearly separate in two distinct tests with different respondents. Whether the opinion of a single expert can effectively repsent consumer opinion is highly questionable at this time in history.

    Table 1.4 Contrast of sensory evaluation tests with quality inspection Sensory testing Separates hedonic (like-dislike) and descriptive information into separate tests Uses repsentative consumers for assessment of product appeal (liking/disliking) Uses trained panelists to specify attributes, but not liking/disliking Oriented to research support Flexible for new, engineered, and innovative products Emphasizes statistical inference for decision making, suitable experimental designs, and sample sizes Quality inspection Used for pass-fail online decisions in manufacturing Provides quality score and diagnostic information concerning defects in one test Uses sensory expertise of highly trained inpiduals May use only one or very few trained experts Product knowledge, potential problems, and causes are stressed Traditional scales are multi-dimensional and poorly suited to statistical analyses Decision-making basis may be qualitative Oriented to standard commodities

    16

    1 Introduction

    1.4 Summary and Conclusions Sensory evaluation comprises a set of test methods with guidelines and established techniques for product psentation, well-defined response tasks, statistical methods, and guidelines for interptation of results. Three primary kinds of sensory tests focus on the

    existence of overall differences among products (discrimination tests), specification of attributes (descriptive analysis), and measuring consumer likes and dislikes (affective or hedonic testing). Correct application of sensory technique involves correct matching of method to the objective of the tests, and this requires good communication between sensory specialists and

    Methods Selection

    yes Consumer Acceptability Question? no Choose from: Preference/choice Ranking Rated Acceptability

    no go to Panel Setup

    Sensory Analytical Question? yes

    yes

    Simple Same/different Question?

    re-open discussion of objectives

    no

    no

    Choose from: Overall difference tests n-alternative forced choice Rated difference from control

    go to Panel Setup Nature of Difference Question? yes

    Choose from: descriptive analysis techniques or modifications

    go to Panel Setup

    Fig. 1.4 A sensory evaluation department may interact with many other departments in a food or consumer products company. Their primary interaction is in support of product research and development, much as marketing research supports the

    References

    end-users of the test results. Logical choices of test participants and appropriate statistical analyses form part of the methodological mix. Analytic tests such as the discrimination and descriptive procedures require good experimental control and maximization of test pcision. Affective tests on the other hand require use of repsentative consumers of the products and test conditions that enable generalization to how products are experienced by consumers in the real world. Sensory tests provide useful information about the human perception of product changes due to ingredients, processing, packaging, or shelf life. Sensory evaluation departments not only interact most heavily with new product development groups but may also provide information to quality control, marketing research, packaging, and, indirectly, to other groups throughout a company (Fig. 1.4). Sensory information reduces risk in decisions about product development and strategies for meeting consumer needs. A wellfunctioning sensory program will be useful to a company in meeting consumer expectations and insuring a greater chance of marketplace success. The utility of the information provided is directly related to the quality of the sensory measurement. . . . , sensory food science stands at the intersection of many disciplines and research traditions, and the stakeholders are many (Tuorila and Monteleone, 2009). Quantities derive from measurement, ps from quantities, comparisons from ps, and victory from comparisons (Sun Tzu – The Art of War (Ch. 4, v.18)).

    References Aaron, J. I., Mela, D. J. and Evans, R. E. 1994. The influence of attitudes, beliefs and label information on perceptions of reduced-fat spad. Appetite, 22(1), 25-38. Adams, H. 1918. The Education of Henry Adams. The Modern Library, New York. Amerine, M. A., Pangborn, R. M. and Roessler, E. B. 1965. Principles of Sensory Evaluation of Food. Academic, New York. ASTM E1958. 2008. Standard guide for sensory claim substantiation. ASTM International, West Conshohocken, PA. ASTM. 1989. Sensory evaluation. In celebration of our beginnings. Committee E-18 on Sensory Evaluation of Materials and Products. ASTM, Philadelphia. Barrios, E. X. and Costell, E. 2004. Review: use of methods of research into consumers’ opinions and attitudes in food research. Food Science and Technology International, 10, 359-371.

    18 Giboreau, A. and Fleury, H. 2009. A new research platform to contribute to the pleasure of eating and healthy food behaviors through academic and applied food and hospitality research. Food Quality and Preference, 20, 533-536 Giménez, A., Ares, G. and Gámbaro, A. 2008. Consumer attitude toward shelf-life labeling: does it influence acceptance? Journal of Sensory Studies, 23, 871-883. Hein, K. A., Hamid, N., Jaeger, S. R. and Delahunty, C. M. 2009. Application of a written scenario to evoke a consumption context in a laboratory setting: effects on hedonic ratings. Food Quality and Preference. doi:10.1016/j.foodqual.2009.10.003 Helm, E. and Trolle, B. 1946. Selection of a taste panel. Wallerstein Laboratory Communications, 9, 181-194. Jones, L. V., Peryam, D. R. and Thurstone, L. L. 1955. Development of a scale for measuring soldier’s food pferences. Food Research, 20, 512-520. Kamen, J. 1989. Observations, reminiscences and chatter. In: Sensory Evaluation. In celebration of our Beginnings. Committee E-18 on Sensory Evaluation of Materials and Products. ASTM, Philadelphia, pp. 118-122. Kimura, A., Wada, Y., Tsuzuki, D., Goto, S., Cai, D. and Dan, I. 2008. Consumer valuation of packaged foods. Interactive effects of amount and accessibility of information. Appetite, 51, 628-634. Lawless, H. T. 1993. The education and training of sensory scientists. Food Quality and Preference, 4, 51-63. Lawless, H. T. and Claassen, M. R. 1993. The central dogma in sensory evaluation. Food Technology, 47(6), 139-146. Lawless, H. T. and Klein, B. P. 1989. Academic vs. industrial perspectives on sensory evaluation. Journal of Sensory Studies, 3, 205-216. Lee, L., Frederick, S. and Ariely, D. 2006. Try it, you’ll like it. Psychological Science, 17, 1054-1058. MacRae, R. W. and Geelhoed, E. N. 1992. Preference can be more powerful than detection of oddity as a test of discriminability. Perception and Psychophysics, 51, 179-181. Meilgaard, M., Civille, G. V. and Carr, B. T. 2006. Sensory Evaluation Techniques. Fourth Second edition. CRC, Boca Raton. Meiselman, H. L. 1993. Critical evaluation of sensory techniques. Food Quality and Preference, 4, 33-40. Mielby, L. H. and Frøst, M. B. 2009. Expectations and surprise in a molecular gastronomic meal. Food Quality and Preference. doi:10.1016/j.foodqual.2009.09.005 Moskowitz, H. R., Beckley, J. H. and Resurreccion, A. V. A. 2006. Sensory and Consumer Research in Food Product Design and Development. Wiley-Blackwell, New York. Moskowitz, H. R. 1983. Product Testing and Sensory Evaluation of Foods. Food and Nutrition, Westport, CT. Oliver, T. 1986. The Real Coke, The Real Story. Random House, New York. O’Mahony, M. 1988. Sensory difference and pference testing: The use of signal detection measures. Chpater 8 In: H. R. Moskowitz (ed.), Applied Sensory Analysis of Foods. CRC, Boca Raton, FL, pp. 145-175.

    1 Introduction O’Mahony, M. 1979. Psychophysical aspects of sensory analysis of dairy products: a critique. Journal of Dairy Science, 62, 1954-1962. Pangborn, R. M. and Dunkley, W. L. 1964. Laboratory procedures for evaluating the sensory properties of milk. Dairy Science Abstracts, 26, 55-121. Park, H. S. and Lee, S. Y. 2003. Genetically engineered food labels, information or warning to consumers? Journal of Food Products Marketing, 9, 49-61. Peryam, D. R. and Swartz, V. W. 1950. Measurement of sensory differences. Food Technology, 4, 390-395. Plotto, A., Barnes, K. W. and Goodner, K. L. 2006. Specific anosmia observed for β-ionone, but not for α-ionone: Significance for flavor research. Journal of Food Science, 71, S401-S406. Shepherd, R., Sparks, P., Belleir, S. and Raats, M. M. 1991/1992. The effects of information on sensory ratings and pferences: The importance of attitudes. Food Quality and Preference, 3, 1-9. Sidel, J. L., Stone, H. and Bloomquist, J. 1981. Use and misuse of sensory evaluation in research and quality control. Journal of Dairy Science, 61, 2296-2302. Siegrist, M. and Cousin, M-E. 2009. Expectations influence sensory experience in a wine tasting. Appetite, 52, 762-765. Skinner, E. Z. 1989. (Commentary). Sensory evaluation. In celebration of our beginnings. Committee E-18 on Sensory Evaluation of Materials and Products. ASTM, Philadelphia, pp. 58-65. Stone, H. and Sidel, J. L. 2004. Sensory Evaluation Practices, Third Edition. Academic, San Deigo. Stone, H., Sidel, J., Oliver, S., Woolsey, A. and Singleton, R. C. 1974. Sensory evaluation by quantitative descriptive analysis. Food Technology 28(1), 24, 26, 28, 29, 32, 34. Sun Tzu (Sun Wu) 1963 (trans.), orig. circa 350 B.C.E. The Art of War. S.B. Griffith, trans. Oxford University. Szczesniak, A. S., Loew, B. J. and Skinner, E. Z. 1975. Consumer texture profile technique. Journal of Food Science, 40, 1253-1257. Tuorila, H. and Monteleone, E. 2009. Sensory food science in the changing society: opportunities, needs and challenges. Trends in Food Science and Technology, 20, 54-62. Wysocki, C. J. and Labows, J. 1984. Inpidual differences in odor perception. Perfumer and Flavorist, 9, 21-24. Yeomans, M. R., Chambers, L., Blumenthal, H. and Blake, A. 2008. The role of expectation in sensory and hedonic evaluation: The case of salmon smoked ice-cream. Food Quality and Preference, 19, 565-573. York, R. K. 1995. Quality assessment in a regulatory environment. Food Quality and Preference, 6, 137-141. Zellner, D. A., Strickhouser, D. and Tornow, C. E. 2004. Disconfirmed hedonic expectations produce perceptual contrast, not assimilation. The American Journal of Psychology, 117, 363-387.

    Chapter 2

    Physiological and Psychological Foundations of Sensory Function

    Abstract This chapter reviews background material underpinning sensory science and sensory evaluation methodologies. Basic and historical psychophysical methods are reviewed as well as the anatomy, physiology, and function of the chemical senses. The chapter concludes with a discussion of multi-modal sensory interactions. There is no conception in man’s mind which hath not at first, totally or in parts, been begotten upon by the organs of sense. -Thomas Hobbes, Leviathan (1651)

    Contents 2.1 2.2

    2.3

    2.4

    2.5

    Introduction . . . . . . . . . . . . . . . . . Classical Sensory Testing and Psychophysical Methods . . . . . . . . . . . . . . . . . . . 2.2.1 Early Psychophysics . . . . . . . . . . 2.2.2 The Classical Psychophysical Methods . . . . . . . . . . . . . . . 2.2.3 Scaling and Magnitude Estimation . . . 2.2.4 Critiques of Stevens . . . . . . . . . . 2.2.5 Empirical Versus Theory-Driven Functions . . . . . . . . . . . . . . . 2.2.6 Parallels of Psychophysics and Sensory Evaluation . . . . . . . . . . . . . . Anatomy and Physiology and Functions of Taste . . . . . . . . . . . . . . . . . . . 2.3.1 Anatomy and Physiology . . . . . . . . 2.3.2 Taste Perception: Qualities . . . . . . . 2.3.3 Taste Perception: Adaptation and Mixture Interactions . . . . . . . . . . . . . . 2.3.4 Inpidual Differences and Taste Genetics . . . . . . . . . . . . . . . Anatomy and Physiology and Functions of Smell . . . . . . . . . . . . . . . . . . . 2.4.1 Anatomy and Cellular Function . . . . . 2.4.2 Retronasal Smell . . . . . . . . . . . 2.4.3 Olfactory Sensitivity and Specific Anosmia . . . . . . . . . . . . . . . 2.4.4 Odor Qualities: Practical Systems . . . . 2.4.5 Functional Properties: Adaptation, Mixture Suppssion, and Release . . . . . . . . Chemesthesis . . . . . . . . . . . . . . . . 2.5.1 Qualities of Chemesthetic Experience . . . . . . . . . . . . . .

    2.5.2

    19 20 20 21 23 25 25

    Physiological Mechanisms of Chemesthesis . . . . . . . . . . . 2.5.3 Chemical “Heat” . . . . . . . . . . 2.5.4 Other Irritative Sensations and Chemical Cooling . . . . . . . . . . . . . . 2.5.5 Astringency . . . . . . . . . . . . . 2.5.6 Metallic Taste . . . . . . . . . . . . 2.6 Multi-modal Sensory Interactions . . . . . 2.6.1 Taste and Odor Interactions . . . . . . 2.6.2 Irritation and Flavor . . . . . . . . . 2.6.3 Color-Flavor Interactions . . . . . . 2.7 Conclusions . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . .

    . .

    42 43

    . . . . . . . . .

    44 45 46 47 47 49 49 50 50

    26

    2.1 Introduction 27 27 30 30 33 34 34 36 37 38 39 41 41

    In order to design effective sensory tests and provide insightful interptation of the results, a sensory professional must understand the functional properties of the sensory systems that are responsible for the data. By a functional property, we mean a phenomenon like mixture interactions such as masking or suppssion. Another example is sensory adaptation, a commonly observed decrease in responsiveness to conditions of more or less constant stimulation. In addition, it is useful to understand the anatomy and physiology of the senses involved as well as their functional limitations. A good example of a functional limitation is the threshold or minimal amount of a stimulus needed for perception. Knowing about the anatomy of the senses

    H.T. Lawless, H. Heymann, Sensory Evaluation of Food, Food Science Text Series, DOI 10.1007/978-1-4419-6488-5_2, © Springer Science+Business Media, LLC 2010

    19

    20

    2

    Physiological and Psychological Foundations of Sensory Function

    can help us understand how consumers and panelists interact with the products to stimulate their senses and by what routes. Different routes of smelling, for example, are the orthonasal or sniffing route, when odor molecules enter the nose from the front (nostrils), versus retronasal smell, when odor molecules pass into the nose from the mouth or from breathing out, and thus have a reversed airflow pathway from that of external sniffing. Another basic area that the sensory professional should have as background knowledge involves the sensory testing methods and human measurement procedures that are the historical antecedents to the tests we do today. This is part of the science of psychophysics, the quantification and measurement of sensory experiences. Psychophysics is a very old discipline that formed the basis for the early studies in experimental psychology. Parallels exist between psychophysics and sensory evaluation. For example, the difference test using paired comparisons is a version of the method used for measuring difference thresholds called the method of constant stimuli. In descriptive analysis with trained panels, we work very hard to insure that panelists use singular uni-dimensional scales. These numerical systems usually refer to a single sensory continuum like sweetness or odor strength and are thus based on changes in perceived intensity. They do not consider multiple attributes and fold them into a single score like the old-quality grading methods. Thus there is a clear psychophysical basis for the attribute scales used in descriptive analysis. This chapter is designed to provide the reader some background in the sensory methods of psychophysics. A second objective is to give an overview of the structure and function of the chemical senses of taste, smell, and the chemesthetic sense. Chemesthesis refers to chemically induced sensations that seem to be at least partly tactile in nature, such as pepper heat, astringency, and chemical cooling. These three senses together comprise what we loosely call flavor and are the critical senses for appciating foods, along with the tactile, force, and motion-related experiences that are part of food texture and mouthfeel. Texture is dealt with in Chapter 11 and color and appearance evaluations in Chapter 12. The auditory sense is not a large part of food perception, although many sounds can be perceived when we eat or manipulate foods. These provide another sense modality to accompany and reinforce our texture perceptions, as in the case of crisp

    or crunchy foods, or the audible hissing sound we get from carbonated beverages (Vickers, 1991). One growing area of interest in the senses concerns our human biopersity, differences among people in sensory function. These differences can be due to genetic, dietary/nutritional, physiological (e.g., aging), or environmental factors. The research into the genetics of the chemical senses, for example, has experienced a period of enormous expansion since the first edition of this book. The topic is too large and too rapidly changing to receive a comphensive treatment here. We will limit our discussion of inpidual differences and genetic factors to those areas that are well understood, such as bitter sensitivity, smell blindness, and color vision anomalies. The sensory practitioner should be mindful that people exist in somewhat different sensory worlds. These differences contribute to the persity of consumer pferences. They also limit the degree to which a trained panel can be “calibrated” into uniform ways of responding. Inpidual differences can impact sensory evaluations in many ways.

    2.2 Classical Sensory Testing and Psychophysical Methods 2.2.1 Early Psychophysics The oldest branch of experimental psychology is that of psychophysics, the study of relationships between physical stimuli and sensory experience. The first true psychophysical theorist was the nineteenth century German physiologist, E. H. Weber. Building on earlier observations by Bernoulli and others, Weber noted that the amount that a physical stimulus needed to be increased to be just perceivably different was a constant ratio. Thus 14.5 and 15 ounces could be told apart, but with great difficulty, and the same could be said of 29 and 30 ounces or 14.5 and 15 drams (Boring, 1942). This led to the formulation of Weber’s law, generally written nowadays as I/I = k

    (2.1)

    where I is the increase in the physical stimulus that was required to be just discriminably different from some starting level, I. The fraction, I/I, is sometimes called the “Weber fraction” and is an index of

    2.2 Classical Sensory Testing and Psychophysical Methods

    how well the sensory system detects changes. This relationship proved generally useful and provided the first quantitative operating characteristic of a sensory system. Methods for determining the difference threshold or just-noticeable-difference (j.n.d.) values became the stock in trade of early psychological researchers. These methods were codified by G. T. Fechner in a book called Elemente der Psychophysik (Elements of Psychophysics) in 1860. Fechner was a philosopher as well as a scientist and developed an interest in Eastern religions, in the nature of the soul, and in the Cartesian mind-body dichotomy. Fechner’s broader philosophical interests have been largely overlooked, but his little book on sensory methods was to become a classic text for the psychology laboratory. Fechner also had a valuable insight. He realized that the j.n.d. might be used as a unit of measurement and that by adding up j.n.d.s one could construct a psychophysical relationship between physical stimulus intensity and sensory intensity. This relationship approximated a log function, since the integral of 1/x dx is proportional to the natural log of x. So a logarithmic relationship appeared useful as a general psychophysical “law:” S = k log I

    (2.2)

    where S is sensation intensity and I is once again the physical stimulus intensity. This relationship known as Fechner’s law was to prove a useful rule of thumb for nearly 75 years, until it was questioned by acoustical researchers who supplanted it with a power law (see Section 2.2.3).

    21

    with a particular type of measured response of sensory systems. The method of limits was well suited to determine absolute or detection thresholds. The method of constant stimuli could be used to determine difference thresholds and the method of adjustment to establish sensory equivalence. In the method of limits the physical stimulus is changed by successive discrete steps until a change in response is noted. For example, when the stimulus is increasing in intensity, the response will change from “no sensation” to “I detect something.” When the stimulus is decreasing in intensity, at some step the response will change back to “no sensation.” Over many trials, the average point of change can be taken as the person’s absolute threshold (see Fig. 2.1). This is the minimum intensity required for detection of the stimulus. Modern variations on this method often use only an ascending series and force the participants to choose a target sample among alternative “blank” samples at each step. Each concentration must be discriminated from a background level such as plain water in the case of taste thresholds. Forced-choice methods for determining thresholds are discussed in detail in Chapter 6. In the method of constant stimuli, the test stimulus is always compared against a constant reference level (a standard), usually the middle point on a series of physical intensity levels. The subject’s job is to respond to each test item as “greater than” or “less

    2.2.2 The Classical Psychophysical Methods Fechner’s enduring contribution was to assemble and publish the details of sensory test methods and how several important operating characteristics of sensory systems could be measured. Three important methods were the method of limits, the method of constant stimuli (called the method of right and wrong cases in those days), and the method of adjustment or average error (Boring, 1942). The methods are still used today in some research situations and variations on these methods form part of the toolbox of applied sensory evaluation. Each of the three methods was associated

    CONCENTRATION

    16.0 8.0 4.0 2.0 1.0 0.5 .25 A1

    D1

    A2

    D2

    A3

    D3

    TRIAL

    Fig. 2.1 An example of the method of limits. The circled reversal points would be averaged to obtain the person’s threshold. A: ascending series. D: descending series. In taste and smell, only ascending series are commonly used to pvent fatigue, adaptation or carry-over of persistent sensations.

    22

    2

    Physiological and Psychological Foundations of Sensory Function

    Frequency (percent) judged “sweeter” than the standard

    80

    60

    40 UDL

    20 standard

    2

    4

    6

    8

    10

    12

    14

    16

    18

    SUCROSE CONCENTRATION (%)

    Fig. 2.2 A psychometric function derived from the Method of Constant Stimuli, a repeated series of paired comparisons against a constant (standard) stimulus, in this case 10% sucrose. Frequency of judgments in which the comparison stimulus is

    judged sweeter than the standard are plotted against concentration. The difference threshold is determined by the concentration difference between the standard and the interpolated 75% (or 25%) point. UDL: Upper difference limen (threshold).

    than” the standard. Many replications of each intensity level are psented. The percentage of times the response is “greater than” can be plotted as in Fig. 2.2. This S-shaped curve is called a psychometric function (Boring, 1942). The difference threshold was taken as the difference between the 50 and 75% points interpolated on the function. The method of constant stimuli bears a strong resemblance to current techniques of paired comparison, with two exceptions. One point of difference is that the method was geared toward interval estimation, rather than testing for statistically significant differences. That is, the technique estimated points on the psychometric function (25, 50, and 75%) and researchers were not concerned with statistical significance of difference tests. Also, a range of comparison stimuli were tested against the standard and not just a single paired comparison of products. The third major method in classical psychophysics was the method of adjustment or average error. The subject was given control over a variable stimulus like a light or a tone and asked to match a standard in brightness or loudness. The method could be used to determine difference thresholds based on the variability of the subject over many attempts at matching, for example, using the standard deviation as a measure

    of difference threshold. A modern application is in measuring sensory tradeoff relationships. In this type of experiment the duration of a very brief tone could be balanced against a varying sound pssure level to yield a constant perception of loudness. Similarly, the duration of a flash of light could be traded off against its photometric intensity to create a constant perceived brightness. For very brief tones or brief flashes, there is summation of the intensity over time in the nervous system, so that increasing duration can be balanced against decreasing physical intensity to create a constant perception. These methods have proven useful in understanding the physiological response of different senses to the temporal properties of stimuli, for example, how the auditory and visual systems integrate energy over time. Adjustment methods have not proven so useful for assessing sensory equivalence in applied food testing, although adjustment is one way of trying to optimize an ingredient level (Hernandez and Lawless, 1999; Mattes and Lawless, 1985). Pangborn and co-workers employed an adjustment method to study inpidual pferences (Pangborn, 1988; Pangborn and Braddock, 1989). Adding flavors or ingredients “to taste” at the benchtop is a common way of initially formulating

    2.2 Classical Sensory Testing and Psychophysical Methods

    products. It is also fairly common to make formula changes to produce approximate sensory matches to some target, either a standard formula or perhaps some competitor’s successful product. However, the method as applied in the psychophysics laboratory is an unwieldy technique for the senses of taste and smell where elaborate equipment is needed to provide adjustable stimulus control. So methods of equivalency adjustment are somewhat rare with food testing.

    23

    they are a log scale of sound pssure relative to a reference (db = 20 log (P/P0 ) where P is the sound pssure and P0 is the reference sound pssure, usually a value for absolute threshold). However, discrepancies were observed between decibels and loudness proportions. Instead, Stevens found with the direct magnitude estimation procedure that loudness was a power function of stimulus intensity, with an exponent of about 0.6. Scaling of other sensory continua also gave power functions, each with its characteristic exponent (Stevens, 1957, 1962). Thus the following relationship held:

    2.2.3 Scaling and Magnitude Estimation S = kI n or log S = n log I + log k A very useful technique for sensory measurement has been the direct application of rating scales to measure the intensity of sensations. Historically known as the “method of single stimuli,” the procedure is highly cost efficient since one stimulus psentation yields one data point. This is in contrast to a procedure like the method of constant stimuli, where the psentation of many pairs is necessary to give a frequency count of the number of times each level is judged stronger than a standard. Rating scales have many uses. One of the most common is to specify a psychophysical function, a quantitative relationship between the perceived intensity of a sensation and the physical intensity of the stimulus. This is another way of describing a dose-response curve or in other words, capturing the input-output function of a sensory system over its dynamic range. The technique of magnitude estimation grew out of earlier procedures in which subjects would be asked to fractionate an adjustable stimulus. For example, a subject would be asked to adjust a light or tone until it seemed half as bright as a comparison stimulus. The technique was modified so that the experimenter controlled the stimulus and the subject responded using (unrestricted) numbers to indicate the proportions or ratios of the perceived intensities. Thus if the test stimulus was twice as bright as the standard, it would be assigned a number twice as large as the rating for the standard and if one-third as bright, a number one-third as large. An important observation in S. S. Stevens’ laboratory at Harvard was that the loudness of sounds was not exactly proportional to the decibel scale. If Fechner’s log relationship was correct, rated loudness should grow in a linear fashion with decibels, since

    (2.3)

    where n was the characteristic exponent and k was a proportionality constant determined by the units of measurement. In other words, the function formed a straight line in a log-log plot with the exponent equal to the slope of the linear function. This was in contrast to the Fechnerian log function which was a straight line in a semilog plot (response versus log physical intensity). One of the more important characteristics of a power function is that it can accommodate relationships that are expanding or positively accelerated while the log function does not. The power function with an exponent less than one fits a law of diminishing returns, i.e., larger and larger physical increases are required to maintain a constant proportional increase in the sensation level. Other continua such as response to electric shocks and some tastes were found to have a power function exponent greater than one (Meiselman, 1971; Moskowitz, 1971; Stevens, 1957). A comparison of power functions with different exponents is shown in Fig. 2.3. Many sensory systems show an exponent less that one. This shows a compssive energy relationship that may have adaptive value for an organism responding to a wide range of energy in the environment. The range from the loudest sound one can tolerate to the faintest audible tone is over 100 dB. This repsents over 10 log units of sound energy, a ratio of 10 billion to one. The dynamic range for the visual response of the eyes to different levels of light energy is equally broad. Thus exponents less than one have ecological significance for sensory systems that are tuned to a broad range of physical energy levels.

    24

    2

    Physiological and Psychological Foundations of Sensory Function

    Psychological Magnitude, S

    Electric Shock

    Apparent Length n=1

    Brightness n1 n=1 n

    1 2

    (4.3)

    The verbal form of the alternative hypothesis is that the population would be correct (saying that AB and BA pairs are different and that AA and BB pairs are the same) more than half the time. The results of the paired difference test will only indicate whether the panelists could significantly discriminate between the samples. Unlike the paired directional test, no specification or direction of difference is indicated. In other words, the sensory scientist will only know that the samples are perceptibly different but not in which attribute(s) the samples differed. An alternative analysis is psented in the Appendix to this chapter, where each panelist sees an identical pair (AA or BB) and one test pair (AB or BA) in randomized sequence.

    4.2.2 Triangle Tests In the triangle test, three samples are psented simultaneously to the panelists, two samples are from the same formulation and one is from the different formulation. Each panelist has to indicate either which sample is the odd sample or which two samples are most similar. The usual form of the score sheet asks the panelist to indicate the odd sample. However, some sensory specialists will ask the panelist to indicate the pair of similar samples. It probably does not matter which question is asked. However, the sensory specialist should not change the format when re-using panelists since they will get confused. See Fig. 4.3 for a sample score sheet. Similarly to the paired difference test the panelist must be trained to understand the task as described by the score sheet. The null hypothesis for the triangle test states that the long-run probability (Pt ) of making a correct selection when there is no perceptible difference between the samples is one in three (H0 :Pt = 1/3). The alternative hypothesis states that the probability that the underlying population will make the correct decision

    Fig. 4.3 Example of a triangle score sheet.

    84

    4

    1 3

    (4.4)

    This is a one-sided alternative hypothesis and the test is one tailed. In this case there are six possible serving orders (AAB, ABA, BAA, BBA, BAB, ABB) which should be counterbalanced across all panelists. As with the difference paired comparison, the triangle test allows the sensory specialist to determine if two samples are perceptibly different but the direction of the difference is not indicated by the triangle test. Again, the sensory scientist will only know that the samples are perceptibly different but not in which attribute(s) the samples differed.

    Discrimination Testing

    Again, the panelists should be trained to perform the task as described by the score sheet correctly. Duo-trio tests allow the sensory specialist to determine if two samples are perceptibly different but the direction of the difference is not indicated by the duo-trio test. In other words, the sensory scientist will only know that the samples are perceptibly different but not in which attribute(s) the samples differed. There are two formats to the duo-trio test, namely the constant reference duo-trio test and the balanced reference duo-trio test. From the point of view of the panelists the two formats of the duo-trio test are identical (see Figs. 4.4a and b), but to the sensory specialist the two formats differ in the sample(s) used as the reference. 4.2.3.1 Constant Reference Duo-Trio Test

    1 2

    (4.5)

    In this case, all panelists receive the same sample formulation as the reference. The constant reference duo-trio test has two possible serving orders (RA BA, RA AB) which should be counterbalanced across all panelists. The constant reference duo-trio test seems to be more sensitive especially if the panelists have had prior experience with the product (Mitchell, 1956). For example, if product X is the current formulation (familiar to the panelists) and product Z is a new reformulation then a constant reference duo-trio test with product X as reference would be the method of choice.

    Before starting please rinse your mouth with water and expectorate. There are three samples in each of the two duo−trio sets for you to evaluate. In each set, one of the coded pairs is the same as the reference. For each set taste the reference first. Then taste each of the coded samples in the sequence psented, from left to right. Take the entire sample in your mouth. NO RETASTING. Circle the number of the sample which is most similar to the reference. Do not swallow any of the sample or the water. Expectorate into the container provided. Rinse your mouth with water between sets 1 and 2.

    Fig. 4.4a Example of a constant reference duo-trio score sheet.

    Set 1

    Reference

    2

    Reference

    4.2 Types of Discrimination Tests

    85

    Fig. 4.4b Example of a balanced reference duo-trio score sheet.

    Reference

    2

    Reference

    reference and the other half of the panelists receive the other sample formulation as the reference. In this case, there are four possible serving orders (RA BA, RA AB, RB AB, RB BA) which should be counterbalanced across all panelists. This method is used when both products are prototypes (unfamiliar to the panelists) or when there is not a sufficient quantity of the more familiar product to perform a constant reference duo-trio test.

    differ in the specified dimension and which sample is higher in perceived intensity of the specified attribute. The danger is that other sensory changes will occur in a food when one attribute is modified and these may obscure the attribute in question. Another version of the n-AFC asks panelists to pick out the weakest or strongest in overall intensity, rather than in a specific attribute. This is a very difficult task for panelists when they are confronted with a complex food system.

    4.2.5 A-Not-A tests There are two types of A-not-A tests referenced in the literature. The first and the more commonly used version has a training phase with the two products followed by monadic evaluation phase (Bi and Ennis, 2001a, b), we will call this the standard A-not-A test. The second version is essentially a sequential paired difference test or simple difference test (Stone and Sidel, 2004), which we will call the alternate A-notA test. The alternate A-not-A test is not frequently used. In the next section we will discuss the alternate A-not-A test first since the statistical analysis for this version is similar to that of the paired comparison discrimination test. The statistical analyses for the various standard A-not-A tests are based on a different theory and somewhat more complex and will be discussed later.

    4.2.5.1 Alternate A-Not-A test This is a sequential same/difference paired difference test where the panelist receives and evaluates the first

    86

    4

    Fig. 4.5 Example of a three-alternative forced choice score sheet.

    relevant to the objective of the study. However, the differences in color or shape or size have to be very subtle and only obvious when the samples are psented simultaneously. If the differences are not subtle the panelists are likely to remember these and they will make their decision based on these extraneous differences.

    4.2.5.2 Standard A-Not-A Test Panelists inspect multiple examples of products that are labeled “A” and usually also products that are labeled “not-A.” Thus there is a learning period. Then once the training period has been completed the panelists receive samples one at a time and are asked whether each one is either A or not-A. As discussed by Bi and Ennis (2001a) the standard A-not-A test potentially has four different designs. For the monadic A-not-A test the panelist, after the training phase, is psented with a single sample (either A or not-A). In the paired A-not-A version the panelist, after completion of the training phase, is psented with a pair of samples, sequentially (one A and one not-A, counter balanced across panelists). In the replicated monadic A-not-A version the panelist, after completion of training, receives a series of samples of either A or not-A but not both. This version is rarely used in practice. Lastly, in the replicated mixed A-not-A version the panelist, after completion of training, receives a series of A and not-A samples. Each of these different formats requires different statistical models and using an inappropriate model could lead to a misleading conclusion. As described by Bi and Ennis (2001a) “The statistical models for the A-Not A method are different from that of other discrimination methods such as the m-AFC, the triangle, and the duo-trio methods.” “Pearson’s and McNemar’s chi-square statistics with one degree of freedom can be used for the

    4.2 Types of Discrimination Tests

    standard A-Not A method while binomial tests based on the proportion of correct responses can be used for the m-AFC, the triangle, and the duo-trio methods. The basic difference between the two types of difference tests is that the former involves a comparison of two proportions (i.e., the proportion of “A” responses for the A sample versus that for the Not A sample) or testing independence of two variables (sample and response) while the latter is a comparison of one proportion with a fixed value (i.e., the proportion of correct responses versus the guessing probability)”. Articles by Bi and Ennis (2001a, b) clearly describe data analysis methods for these tests.. Additionally, the article by Brockhoff and Christensen (2009) describes a R-package called SensR (http://www.cran.rproject.org/package=sensR/) that may be used for the data analyses of some Standard A-not-A tests. The data analyses associated with the standard A-not-A tests are beyond the scope of this textbook, but see the Appendix of this chapter which shows the application of the McNemar chi-square for a simple A-not-A test where each panelist received one standard product (a “true” example of A) and one test product. Each is psented separately and a judgment is collected for both products.

    87

    4.2.6.2 The Harris-Kalmus Test

    4.2.6 Sorting Methods In sorting tests the panelists are given a series of samples and they are asked to sort them into two groups. The sorting tests can be extremely fatiguing and are not frequently used for taste and aroma sensory evaluation but they are used when sensory specialists want to determine if two samples are perceptibly different in tactile or visual dimensions. The sorting tests are statistically very efficient since the long-run probability of the null hypotheses of the sorting tests can be very small. For example, the null hypothesis of the two-out-of-five test is 1 in 10 (P2/5 = 0.1) and for the Harris-Kalmus test the null hypothesis is 1 in 70 (P4/8 = 0.0143). These tests are discussed below. 4.2.6.1 The Two-Out-of-Five Test The panelists receive five samples and are asked to sort the samples into two groups, one group should contain the two samples that are different from the other

    88

    4

    Discrimination Testing

    4.2.8 Dual-Standard Test The dual standard was first used by Peryam and Swartz (1950) with odor samples. It is essentially a duo-trio test with two reference standards-the control and the variant. The two standards allow the panelists to create a more stable criterion as to the potential difference between the samples. The potential serving orders for this test are R(A) R(B) , AB, R(A) R(B) BA, R(B) R(A) AB, R(B) R(A) BA. The probability of guessing the correct answer by chance is 0.5 and the data analyses for this test are identical to that of the duo-trio test. Peryam and Swartz felt quite strongly that the technique would work best with odor samples due to the relatively quick recovery and that the longer recovery associated with taste samples would pclude the use of the test. The test was used by Pangborn and Dunkley (1966) to detect additions of lactose, algin gum, milk salts, and proteins to milk. O’Mahony et al. (1986) working with lemonade found that the dual-standard test elicited superior performance over the duo-trio test. But O’Mahony (personal communication, 2009) feels that this result is in error, since the panelists were not instructed to evaluate the standards prior to each pair evaluation and therefore the panelists were probably reverting to a 2-AFC methodology. This would be in agreement with Huang and Lawless (1998) who studied sucrose additions to orange juice and they did not find superiority in performance between the dual standard and the duo-trio or the ABX tests.

    4.3 Reputed Strengths and Weaknesses of Discrimination Tests If the batch-to-batch variation within a sample formulation is as large as the variation between formulations

    4.4 Data Analyses

    then the sensory specialist should not use triangle or duo-trio tests (Gacula and Singh, 1984). In this case the paired comparison difference test could be used but the first question that the sensory specialist should ask is whether the batch-to-batch variation should not be studied and improved prior to any study of new or different formulations. The major weakness of all discrimination tests is that they do not indicate the magnitude of the sensory difference(s) between the sample formulations. As the simple discrimination tests are aimed at a yes/no decision about the existence of a sensory difference, they are not designed to give information on the magnitude of a sensory difference, only whether one is likely to be perceived or not. The sensory specialist should not be tempted to conclude that a difference is large or small based on the significance level or the probability (pvalue) from the statistical analysis. The significance and p-value depend in part upon the number of panelists in the test as well as the inherent difficulty of the particular type of discrimination test method. So these are no acceptable indices of the size of the perceivable difference. However, it is sensible that a comparison in which 95% of the judges answered correctly has a larger sensory difference between control and test samples than a comparison in which performance was only at 50% correct. This kind of reasoning works only if a sufficient number of judges were tested, the methods were the same, and all test conditions were constant. Methods for interval level scaling of sensory differences based on proportions of correct discriminations in forced choice tests are discussed further in Chapter 5 as Thurstonian scaling methods. These methods are indirect measures of small differences. They are also methodologically and mathematically complex and require certain assumptions to be met in order to be used effectively. Therefore we feel that the sensory specialist is wiser to base conclusions about the degree of difference between samples on scaled (direct) comparisons, rather than indirect estimates from choice performance in discrimination tests. However, there are alternative opinions in the sensory community and we suggest that interested parties read Lee and O’Mahony (2007). With the exception of the 2-AFC and 3-AFC tests the other discrimination tests also do not indicate the nature of the sensory difference between the samples. The major strength of the discrimination tests is that the task that the panelists perform is quite simple and

    89

    intuitively grasped by the panelists. However, it is frequently the very simplicity of these tests that lead to the generation of garbage data. Sensory specialists must be very aware of the power, replication, and counterbalancing issues associated with discrimination tests. These issues are discussed later in this chapter.

    4.4 Data Analyses The data from discrimination tests may be analyzed by any of the following statistical methods. The three data analyses are based on the binomial, chi-square, or normal distributions, respectively. All these analyses assume that the panelists were forced to make a choice. Thus they had to choose one sample or another and could not say that they did not know the answer. In other words, each panelist either made a correct or incorrect decision, but they all made a decision.

    4.4.1 Binomial Distributions and Tables The binomial distribution allows the sensory specialist to determine whether the result of the study was due to chance alone or whether the panelists actually perceived a difference between the samples. The following formula allows the sensory scientists to calculate the probability of success (of making a correct decision; p) or the probability of failure (of making an incorrect decision; q) using the following formula. P(y) =

    n! py pn−y y!(n − y)!

    (4.6)

    where n = total number of judgments y = total number of correct judgments p = probability of making the correct judgment by chance In this formula, n! describes the mathematical factorial function which is calculated as n×(n-1)× (n-2). . .×2×1. Before the widespad availability of calculators and computers, calculation of the binomial formula was quite complicated, and even now

    90

    4

    it remains somewhat tedious. Roessler et al. (1978) published a series of tables that use the binomial formula to calculate the number of correct judgments and their probability of occurrence. These tables make it very easy to determine if a statistical difference were detected between two samples in discrimination tests. However, the sensory scientist may not have these tables easily available thus he/she should also know how to analyze discrimination data using statistical tables that are more readily available. We abridged the tables from Roessler et al. (1978) into Table 4.3. Using this table is very simple. For example, in a duo-trio test using 45 panelists, 21 panelists correctly matched the sample to the reference. In Table 4.3, in the section for duo-trio tests, we find that the table value for 45 panelists at 5% probability is 29. This value is larger than 21 and therefore the panelists could not detect a difference between the samples. In a different study, using a triangle test, 21 of 45 panelists correctly identified the odd sample. In Table 4.3, in the section for triangle tests, we find that the table value for 45 panelists at 5% probability is 21. This value is equal to 21 and therefore the panelists could detect a significant difference between the samples at the alpha probability of 5%.

    Discrimination Testing

    O2 = observed number incorrect choices E1 = expected number of correct choices E1 is equal to total number of observations (n) times probability (p) of a correct choice, by chance alone in a single judgment where p = 0.100 for the two-out-of-five test p = 0.500 for duo-trio, paired difference, paired directional, alternate A-not-A tests p = 0.333 for triangle tests E2 = expected number of incorrect choices E2 is equal to total number of observations (n) times probability (q) of an incorrect choice, by chance alone in a single judgment where q = 1−p q = 0.900 for the two-out-of-five test q = 0.500 for duo-trio, paired difference, paired directional, alternate A-not-A tests, ABX tests q = 0.667 for triangle tests The use of discrimination tests allows the sensory scientist to determine whether two products are statistically perceived to be different, therefore the degrees of freedom equal one (1). Therefore, a χ 2 table using df = 1 should be consulted, for alpha (α) at 5% the critical χ2 value is 3.84. For other alpha levels consult the chi-square table in the Appendix.

    4.4.2 The Adjusted Chi-Square (χ2 ) Test

    4.4.3 The Normal Distribution and the Z-Test on Proportion The chi-square distribution allows the sensory scientist to compare a set of observed frequencies with a matching set of expected (hypothesized) frequencies. The chi-square statistic can be calculated from the following formula (Amerine and Roessler, 1983), which includes the number -0.5 as a continuity correction. The continuity correction is needed because the χ 2 distribution is continuous and the observed frequencies from discrimination tests are integers. It is not possible for one-half of a person to get the right answer and so the statistical approximation can be off by as much as 1/2, maximally. χ = 2

    The sensory specialist can also use the areas under the normal probability curve to estimate the probability of chance in the results of discrimination tests. The tables associated with the normal curve specify areas under the curve (probabilities) associated with specified values of the normal deviate (z). The following two formulae (Eqs. (4.8) and (4.9)) can be used to calculate the z-value associated with the results of a specific discrimination test (Stone and Sidel, 1978):

    z=

    + 1.645(1.5) (X/N)(1−X/N) N

    (5.7) Here is a worked example. Suppose we do a triangle test with 60 panelists, and 30 get the correct answer. We can ask the following questions: What is the best estimate of the number of discriminators? The proportion of discriminators? What is the upper 95% confidence interval on the number of discriminators? What is the confidence interval on the proportion of discriminators? Finally, could we conclude that there is significant similarity, based on a maximum allowable proportion of discriminators of 50%? The solution is as follows: Let X = number correct, D = number of discriminators, so X = 30 and N = 60. We have 1.5(30)-0.5(60) = D or 15 discriminators, or 25% of our judges detecting the difference, as our best estimate.

    110

    5 Similarity, Equivalence Testing, and Discrimination Theory

    The standard error is given by

    1.5

    (30/60)

    (6.4)

    variation was used by Stevens et al. (1988) in a landmark paper on the inpidual variability in olfactory thresholds. In this case, five correct pairs were required to score the concentration as correctly detected, and this performance was confirmed at the next highest concentration level. The most striking finding of this study was that among the three inpiduals tested 20 times, their inpidual thresholds for butanol, pyridine, and phenylethylmethylethyl carbinol (a rose odorant) varied over 2,000- to 10,000-fold in concentration. Variation within an inpidual was as wide as the variation typically seen across a population of test subjects. This surprising result suggests that day-to-day variation in olfactory sensitivity is large and that thresholds for an inpidual are not very stable (for an example, see Lawless et al., 1995). More recent work using extensive testing of inpiduals at each concentration step suggests that these estimates of variability may be high. Walker et al. (2003) used a simple yes/no procedure (like the A, not-A test, or signal detection test) with 15 trials of targets and 15 trials of blanks at each concentration level. Using a model for statistical significant differences between blank and target trials, they were able to get sharp gradients for the inpidual threshold estimates. In summary, an ascending forced-choice method is a reasonably useful compromise between the need to pcisely define a threshold level and the problems encountered in sensory adaptation and observer fatigue when extensive measurements are made. However, the user of an ascending forced-choice procedure should be aware of the procedural choices that can affect the obtained threshold value. The following choices will affect the measured value: the number of alternatives (both targets and blanks), the stopping rule, or the number of correct steps in a row required to establish a threshold, the number of replicated correct trials required at any one step, and the rule to determine at what level of concentration steps the threshold value is assigned. For example, the inpidual threshold might be assigned at the lowest level correct, the geometric mean between the lowest level correct and highest level incorrect. Other specific factors include the chosen step size of concentration units (factors of two or three are common in taste and smell), the method of averaging or combining replicated ascending runs on the same inpidual and finally the method of averaging or combining group data. Geometric means are commonly used for the last two purposes.

    136

    6 Measurement of Sensory Thresholds

    6.7 Probit Analysis

    7.0

    97.5 95 (90)

    90 (80)

    (85)

    6.0

    Cumulative Percentage

    80 70

    (57)

    60 5.0

    50

    (40)

    40

    PROBIT

    It is often useful to apply some kind of transformation or graphing method to the group data to linearize the curve used to find the 50% point in a group. Both the psychometric curve that repsents the behavior of an inpidual in multiple trials of a threshold test and the cumulative distribution of a group will resemble an S-shaped function similar to the cumulative normal distribution. A number of methods for graphing such data are shown in the ASTM standard E-1432 (ASTM, 2008b). One simple way to graph the data is simply to plot the cumulative percentages on “probability paper.” This p-printed solution provides a graph in which equal standard deviations are marked off along the ordinate, effectively stretching the percentile intervals at the ends and compssing them in the midrange to conform to the density of the normal distribution. Another way to achieve the straightening of the S-shaped response curve is to transform the data by taking z-scores. Statistical packages for data analysis often provide options for transformation of the data. A related method was once widely used in threshold measurement, called Probit analysis (ASTM, 2008b; Dravnieks and Prokop, 1975; Finney, 1971). In this approach, the inpidual points are transformed relative to the mean value, pided by the standard deviation and then a constant value of +5 is added to translate all the numbers to positive values for convenience. A linear fitted function can now be interpolated at the value of 5 to estimate the threshold as in Fig. 6.6. The conversion (to a z-score +5) tends to make an S-shaped curve more linear. An example of this can be found in the paper by Brown et al. (1978), using data from a multiple paired test. First the percent correct is adjusted for chance. Then the data are transformed from the percent correct (across the group) at each concentration level by conversion to z-scores and a constant of 5 is added. The mean value or Probit equal to 5 can be found by interpolation or curve fitting. An example of this technique for estimating threshold from a group of 20 panelists is shown in Meilgaard et al. (1991) and in ASTM (2008b). Probit plots can be used for any cumulative proportions, as well as ranked data and analysis of inpiduals who are more extensively tested than in the 3-AFC method example shown earlier.

    30 (18)

    20 10

    4.0 (6)

    5 2.5

    3.0 1

    2 4 8 16 32 64 Threshold Concentration (log scale)

    Fig. 6.6 An example of Probit analysis. Numbers in parentheses are the cumulative percentages of panelists reaching threshold at each concentration step. Note the uneven scale on the left axis. Probits mark off equal standard deviations and are based on the z-score for any proportion plus the constant, 5. Interpolation at the 50% or Probit 5.0 gives the threshold value.

    6.8 Sensory Adaptation, Sequential Effects, and Variability Inpidual variability, both among a group of people and within an inpidual over repeated measurements psents a challenge to the idea that the threshold is anything like a fixed value. For example, stable olfactory thresholds of an inpidual are difficult to measure. The test-retest correlation for inpidual’s olfactory threshold is often low (Punter, 1983). Even within an inpidual, threshold values will generally decrease with practice (Engen, 1960; Mojet et al., 2001; Rabin and Cain, 1986), and superimposed upon this practice effect is a high level of seemingly random variation (Stevens et al., 1988). Inpiduals may become sensitive to odorants to which they were formerly anosmic, apparently through simple exposure (Wysocki et al., 1989). Increased sensitivity as a function of exposure may be a common phenomenon among women of childbearing age (Dalton et al., 2002; Diamond et al., 2005). Sensory adaptation and momentary changes in sensitivity due to sequences may have occurred in

    6.9 Alternative Methods: Rated Difference, Adaptive Procedures, Scaling

    the experiments of Stevens et al. (1988) and could have contributed to some instability in the measurements. As pdicted by sequential sensitivity analysis (Masuoka et al., 1995; O’Mahony and Odbert, 1985) the specific stimulus sequence will render discrimination more or less difficult. After the stronger of two stimuli, an additional second strong stimulus psented next may be partially adapted and seem weaker than normal. Stevens et al. remarked that sometimes subjects would get all five pairs correct at one level with some certainty that they “got the scent” but lost the signal at the next level before getting it back. This report and the reversals of performance in threshold data are consistent with adaptation effects temporarily lessening sensitivity. The sensory impssion will sometimes “fade in and out” at levels near threshold. In attempts to avoid adaptation effects, other researchers have gone to fewer psentations of the target stimulus. For example, Lawless et al. (1995) used one target among three blank stimuli, a 4-AFC test that has appeared in pvious studies (e.g., Engen, 1960; Punter, 1983). This lowers the chance performance level and lessens the potential adaptation at any one concentration step. To guard against the effects of correct guessing, threshold was taken as the lowest concentration step with a correct choice when all higher concentrations were also correct. Thresholds were measured in duplicate ascending runs in a test session, and a duplicate session of two more ascending runs was run on a second day. Correlations across the four ascending series ranged from 0.75 to 0.92 for cineole and from 0.51 to 0.92 for carvone. For carvone, thresholds were better duplicated within a day (r = 0.91 and 0.88) than across days (r from 0.51 to 0.70). This latter result suggests some drift over time in odor thresholds, in keeping with the variability seen by Stevens et al. (1988). However, results with this ascending method may not be this reliable for all compounds. Using the ascending 4-AFC test and a sophisticated olfactometer, Punter (1983) found median retest correlations for 11 compounds to be only 0.40. The sense of taste may fare somewhat better. In a study of electrogustometric thresholds with ascending paired tests requiring five correct responses, retest correlations for an elderly population were 0.95 (Murphy et al., 1995). In many forced-choice studies, high variability in smell thresholds is also noted across the testing pool.

    137

    Brown et al. (1978) stated that for any test compound, a number of insensitive inpiduals would likely be seen in the data set, when 25 or more persons were tested to determine an average threshold. Among any given pool of participants, a few people with otherwise normal smell acuity will have high thresholds. This is potentially important for sensory professionals who need to screen panelists for detection of specific flavor or odor notes such as defects or taints. In an extensive survey of thresholds for branched-chain fatty acids, Brennand et al. (1989) remarked that “some judges were unable to identify the correct samples in the pairs even in the highest concentrations provided” and that ” panelists who were sensitive to most fatty acids found some acids difficulty to perceive” (p. 109). Wide variation in sensitivity was also observed to the common flavor compound, diacetyl, a buttery-smelling by-product of lactic bacteria fermentation (Lawless et al., 1994). Also, simple exposure to some chemicals can modify specific anosmia and increase sensitivity (Stevens and O’Connell, 1995).

    6.9 Alternative Methods: Rated Difference, Adaptive Procedures, Scaling 6.9.1 Rated Difference from Control Another practical procedure for estimating threshold has involved the use of ratings on degree-of-difference scales, where a sample containing the to-be-recognized stimulus is compared to some control or blank stimulus (Brown et al., 1978; Lundahl et al., 1986). Rated difference may use a line scale or a category scale, ranging from no difference or “exact same” to a large difference, as discussed in Chapter 4. In these procedures ratings for the sensory difference from the control sample will increase as the intensity of the target gets stronger. A point on the plot of ratings versus concentration is assigned as threshold. In some variations on this method, a blind control sample is also rated. This provides the opportunity to estimate a baseline or false alarm rate based on the ratings (often nonzero) of the control against itself. Identical samples will often get nonzero difference estimates due to the moment-to-moment variability in sensations.

    138

    of the inpidual thresholds, due to the larger number of observations, another oddity of using statistical significance to determine the threshold. Instead of using statistical significance as a criterion, Marin et al. determined the point of maximum curvature on the dose-response curve as the threshold. Such an approach makes sense from consideration of the general form of the dose-response (psychophysical) curve for most tastes and odors. Figure 6.7 shows a semi-log plot for the Beidler taste equation, a widely applied dose-response relationship in studies of the chemical senses (see Chapter 2). This function has two sections of curvature (change in slope, i.e., acceleration) when plotted as a function of log concentration. There is a point at which the response appears to be slowly increasing out from the background noise and then rises steeply to enter the middle of the dynamic range of response. The point of maximum curvature can be estimated graphically or determined from curve fitting and finding the maximum rate of change (i.e., maximum of the second derivative) (Marin et al., 1991).

    Psychophysical function (semi-hyperbolic) 100

    Response (percent of maximum)

    In one application of this technique for taste and smell thresholds, a substance was added in various levels to estimate the threshold in a food or beverage. In each inpidual trial, three samples would be compared to the control sample with no added flavor-two adjacent concentration steps of the target compound and one blind control sample (Lundahl et al., 1986). Samples were rated on a simple 9-point scale, from zero (no difference) to eight (extreme difference). This provided a comparison of the control to itself and a cost-effective way of making three comparisons in one set of samples. Since sample concentrations within the three rated test samples were randomized, the procedure was not a true ascending series and was dubbed the “semi-ascending paired difference method.” How is the threshold defined in these procedures? One approach is to compare the difference ratings for a given level with the difference ratings given to the control sample. Then the threshold can be based on some measure of when these difference ratings perge, such as when they become significantly different by a t-test (see Brown et al., 1978). Another approach is simply to subtract the difference score given to the blind control from the difference score give to each test sample and treat these adjusted scores as a new data set. In the original paper of Lundahl et al. (1986), this latter method was used. In the analysis, they performed a series of ttests. Two values were taken to bracket the range of the threshold. The upper level was the first level yielding a significant t-test versus zero, and the lower level was the nearest lower concentration yielding a significant ttest versus the first. This provided an interval in which the threshold (as defined by this method) lies between the two bracketing concentrations. One problem with this approach is that when the threshold is based on the statistical significance of t-statistics (or any such significance test), the value of threshold will depend upon the number of observations in the test. This creates a nonsensical situation where the threshold value will decrease as a function of the number of panelists used in the test. This is an irrelevant variable, a choice of the experimenter, and has nothing to do with the physiological sensitivity of the panelist or the biological potency of the substance being tested, a problem recognized by Brown et al. (1978) and later by Marin et al. (1991). Marin et al. also pointed out that a group threshold, based on a larger number of observations than an inpidual threshold, would be lower than the mean

    6 Measurement of Sensory Thresholds

    80

    60

    R = (Rmax C) / (K + C) K = 1000, Rmax = 100

    40

    20

    zone of inflection = threshold range

    0 0

    2

    4 log Concentration

    6

    8

    Fig. 6.7 Beidler curve.

    6.9.2 Adaptive Procedures Popular methods for threshold measurement for visual and auditory stimuli have been procedures in which the next stimulus intensity level to be tested depends

    6.9 Alternative Methods: Rated Difference, Adaptive Procedures, Scaling

    SAMPLE UDTR RECORD CONCENTRATION STEP

    CONCENTRATION STEP

    the part of the respondent. Psychophysical researchers have found ways to undo this sequential dependence to counteract observer expectancies. One example is the double random staircase procedure (Cornsweet, 1962) in which trials from two staircase sequences are randomly intermixed. One staircase starts above the threshold and descends, while the other starts below the threshold and ascends. On any given trial, the observer is unaware which of the two sequences the stimulus is chosen from. As in the simple staircase procedure, the level chosen for a trial depends upon detection or discrimination in the pvious trial, but of that particular sequence. Further refinements of the procedure involve the introduction of forced-choice (Jesteadt, 1980) to eliminate response bias factors involved in simple yes/no detection. Another modification to the adaptive methods has been to adjust the ascending and descending rules so that some number of correct or incorrect judgments is required before changing intensity levels, rather than the one trial as in the simple staircase (Jesteadt, 1980). An example is the “up down transformed response” rule or UDTR (Wetherill and Levitt, 1965). Wetherill and Levitt gave an example where two positive judgments were required before moving down, and only one negative judgment at a given level before moving up. An example is shown in Fig. 6.9. Rather than estimating the 50% point on a traditional psychometric function, this more stringent requirement now tends to converge on the 71% mark as an average of the peaks and valleys in the series. A forced choice can be added to an adaptive procedure. Sometimes

    SAMPLE STAIRCASE RECORD

    64 32 16 8 4

    R 2 R R 2

    3

    4

    R

    R

    1

    1

    5

    139

    6

    7

    R 8

    9

    10

    TRIAL

    Fig. 6.8 Staircase example.

    TRIAL

    Fig. 6.9 Staircase example.

    140

    6.9.3 Scaling as an Alternative Measure of Sensitivity Threshold measurements are not the only way to screen inpiduals for insensitivity to specific compounds like PTC or to screen for specific anosmia. Do thresholds bear any relation to suprathreshold responding? While it has been widely held that there is no necessary relationship between threshold sensitivity and suprathreshold responding (Frijters, 1978; Pangborn, 1981), this assertion somewhat overstates the case. Counter-examples of good correlations can be seen in tests involving compounds like PTC where there are insensitive groups. For example, there is a -0.8 correlation between simple category taste intensity ratings for PTC and the threshold, when the rated concentration is near the antimode or center between the modes of a bimodal threshold frequency distribution (Lawless, 1980). Thus ratings of a carefully chosen level can be used for a rapid screening method for PTC taster status (e.g., Mela, 1989). Similar results have been noted for smell. Berglund and Högman (1992) reported better reliability of suprathreshold ratings than threshold determinations in screening for olfactory sensitivity. Stevens and O’Connell (1991) used category ratings of perceived intensity as well as qualitative descriptors as a screening tool before threshold testing for specific anosmia. Threshold versus rating correlations were in the range of -0.6 for cineole, -0.3 for carvone, and -0.5 for diacetyl (Lawless et al., 1994, 1995). The correlations were obtained after subtraction of ratings to a blank stimulus, in order to correct for differences in scale usage. Thus there is a moderate negative correlation of sensitivity and rated intensity when one

    6 Measurement of Sensory Thresholds

    examines the data across a highly variable group as is the case with specific anosmia or tasting PTC bitterness. The correlation is negative since higher thresholds indicate lower sensitivity and thus lower rated intensity.

    6.10 Dilution to Threshold Measures 6.10.1 Odor Units and Gas-Chromatography Olfactometry (GCO) In this section, several applied methods will be described that make use of the threshold concept in trying to determine the sensory impact of various flavors and odor materials. The first group of methods concerns the olfactory potency of volatile aroma compounds as they are found in foods or food components. The issue here becomes one of not just threshold determination, but determination of both the threshold and the actual concentration psent in a food sample. The ratio of these concentrations (actual concentration to threshold concentration) can help indicate whether or not a given flavor substance is likely to contribute to the overall sensory impssion in a food. These ratios are commonly called “odor units.” The second much older method is similar in logic and was developed to determine the point at which the irritative or heat sensations from pepper compounds would be first detectable when diluted to a given extent, the Scoville procedure. Both of these techniques then use dilution-to-threshold as a measure of sensory impact. When a complex natural product like a fruit extract is analyzed for its chemical components, hundreds or even thousands of chemicals may be identified, many of which have odor properties. The number of potential flavor compounds identified in any product seems only to be limited by the resolution and sensitivity of the current methods in analytical chemistry. These methods are always improving leading to longer and longer lists of possible contributing flavor materials (Piggott, 1990). Flavor scientists need to find a way to narrow the list or to separate those compounds which are most likely contributing to the overall flavor from those compounds that are psent in

    6.10 Dilution to Threshold Measures

    such low concentrations that they are probably not important. Obviously, a sensory-based method is needed in conjunction with the analytical chemistry to provide a bioassay for possible sensory impact (Acree, 1993). Thresholds can be useful in addressing this kind of problem. The reasoning goes that only those compounds that are psent in the product in concentrations above their threshold are likely to be contributors to the flavor of the product. There are a number of potential flaws in this thinking discussed below, but for now let us see how this can be put to use. Given a concentration C psent in a natural product, a dimensionless quantity can be derived by piding that concentration by the threshold concentration Ct , and the ratio C/Ct defines the number of odor units (or flavor units) for compounds assessed by smell. According to this logic, only those compounds with odor units greater than one will contribute to the aroma of the product. This reasoning is extended sometimes to include the idea that the greater the number of odor units, the greater the potential contribution. However, it is now widely recognized that the odor unit is a concentration multiple and not a measure of subjective magnitude. Only direct scaling methods can assess the actual magnitude of sensation above threshold and the psychophysical relationship between concentration and odor intensity (Frijters, 1978). Furthermore, this idea ignores the possibility of subthreshold additivity or synergy (Day et al., 1963). A closely related group of chemical compounds might all be psent below their inpidual thresholds, but together could stimulate common receptors so as to produce an above-threshold sensation. Such additivity is not pdicted by the odor unit approach and such a group of compounds could be missed in dilution analysis. Nonetheless, thresholds provide at least one isointense reference point on the dose response curve, so they have some utility as a measure of potency used to compare different odor compounds. In analyzing a food, one could look up literature values for all the identified compounds in the product in one of the published compendia of thresholds (e.g., ASTM, 1978; van Gemert, 2003). If the concentration in the product is determined, then the odor unit value can be calculated by simply piding by threshold. However, it is important to remember that the literature values for thresholds depend upon the method and the medium

    141

    of testing. Unless the same techniques are used and the same medium was used as the carrier (rarely the case) the values may not be necessarily comparable for different compounds. A second approach is to actually measure the dilutions necessary to reach threshold for each compound, starting with the product itself. This necessitates the use of a separatory procedure, so that each compound may be inpidually perceived. The combination of gas chromatography with odor port sniffing of a dilution series is a popular technique (Acree, 1993). Various catchy names have been applied to such techniques in the flavor literature, including Aroma Extract Dilution Analysis (for examples, see Guth and Grosch, 1994; Milo and Grosch, 1993; Schieberle and Grosch, 1988), CHARM analysis (Acree et al., 1984) or more generically, gas chromatography olfactometry or GCO. The basis of these techniques is to have subjects respond when an odor is perceived when sniffing the exit port during a GC run. In recent years, the effluent has been embedded in a cooled, humidified air stream to improve the comfort of the observer and to increase sensory resolution of the eluting compounds. Over several dilutions, the response will eventually drop out, and the index of smell potency is related to the reciprocal of the dilution factor. The sniffer’s responses occur on a time base that can be cross-referenced to a retention index and then the identity of the compound can be determined by a combination of retention index, mass spectrometry, and aroma character. In practice, these techniques considerably shorten the list of potential aroma compounds contributing to flavor in a natural product (e.g., Cunningham et al., 1986). The method has also been used as an assessment technique for measuring the sensitivity of human panelists, as opposed to being a tool to determine the sensory impact of flavor compounds (Marin et al., 1988). In this approach, the gas chromatograph once again serves as an olfactometer. Known compounds can be psented as a dilution series of mixtures. Variation in inpidual thresholds can readily be assessed for a variety of compounds since they can be combined in a GC run as long as they have different retention times, i.e., do not co-elute on the column of choice. The potential use of GCO for screening panelists, for assessing odor responses in general, and for assessing specific anosmias has also been attempted (Friedrich and Acree, 2000; Kittel and Acree, 2008).

    142

    6.10.2 Scoville Units Another example of a dilution method is the traditional Scoville procedure for scoring pepper heat in the spice trade. This procedure was named for W. Scoville who worked in the pharmaceutical industry in the early twentieth century. He was interested in the topical application of spice compounds like pepper extracts as counterirritants, and he needed to establish units that could be used to measure their potency. His procedure consisted of finding the number of dilutions necessary for sensations to disappear and then using this number of dilutions as an estimate of potency. In other words, potency was defined as a reciprocal threshold. A variation of this procedure was adopted by the Essential Oil Association, British Standards Institution, International Standards Organization, American Spice Trade Association (ASTA), and adopted as an Indian standard method (for a review, see Govindarajan, 1987). The procedure defined units of pungency as the highest dilution at which a definite “bite” would be perceived and thus contains instructions consistent with a recognition threshold. Scoville units were dilution factors, now commonly given as mL/g. ASTA Method 21 (ASTA, 1968) is widely used and contains some modifications in an attempt to overcome problems with the original Scoville procedure. In brief, the method proceeds as follows: Panelists are screened for acuity relative to experienced persons. Dilution schedules are provided which simplify calculations of the eventual potency. Solutions are tested in 5% sucrose and negligible amounts of alcohol. Five panelists participate and concentrations are given in ascending order around the estimated threshold. Threshold is defined as the concentration at which three out of five judges respond positively. This method is difficult in practice and a number of additional variations have been tried to improve on the accuracy and pcision of the method (Govindarajan, 1987). Examples include the following: (1) substitution of other rules for 3/5, e.g., mean + SD of 20-30 judgments, (2) use of a triangle test, rather than simple yes/no at each concentration, (3) requiring recognition of pungency (Todd et al., 1977), (4) reduction of sucrose concentration in the carrier solution to 3%, and (5) use of a rating scale from 1 (definitely not detectable) to 6 (definitely detectable). This latter

    6 Measurement of Sensory Thresholds

    modification defined a detection threshold at mean scale value of 3.5. Almost all these methods specify mandatory rest periods between samples due to the long lasting nature of these sensations. The measurements are still difficult. One problem is that capsaicin, the active heat principle in red pepper, is prone to desensitize observers within a session and also regular consumers of hot spices also become less sensitive, leading to wide inpidual differences in sensitivity among panelists (Green, 1989; Lawless et al., 1985). Alternative procedures have been developed based on rating scales with fixed physical references (Gillette et al., 1984) and these have been endorsed by ASTM (2008c) as standard test methods. These rating scale procedures show good correlations with instrumental measures of capsaicin content in pepper samples and can be cross-referenced to Scoville units for those who pfer to do business in the traditional units (Gillette et al., 1984).

    6.11 Conclusions Threshold measurements find three common uses in sensory analysis and flavor research. First, they can be used to compare the sensitivities of different panelists. Second, they can be used as an index of the biological potency of a flavor compound. Third, they can provide useful information regarding the maximum tolerable levels of an off-flavor or taint. A variety of different techniques have been used to find thresholds or have employed the threshold concept in practical flavor work. Examples of different threshold methods are given in Table 6.4. In spite of their practical applications, the usefulness of threshold measures is often questioned in sensory evaluation. One criticism is that thresholds are only one point on an intensity function and thus they do not tell us anything about abovethreshold responding. There are some good examples in which thresholds do not pdict or do not correlate very well with suprathreshold responses. For example, patients irradiated for cancer may lose their sense of taste temporarily and thresholds return to normal long before suprathreshold responsiveness is recovered (Bartoshuk, 1987). However, as we have seen both in the case of PTC tasting and in specific anosmias, insensitive inpiduals (as determined by their threshold) will also tend to be less responsive above

    Appendix

    143

    Table 6.4 Threshold procedures Method Citations/examples

    Response

    Threshold

    Ascending forced choice Ascending forced choice

    ASTM E-679-79 Stevens et al. (1988)

    3-AFC 2-AFC, 5 replicates

    Semi-ascending paired difference

    Lundahl et al. (1986) Rated difference from control

    Adaptive, up-down transformed response rule

    Reed et al. (1995)

    2-AFC

    Double Random Staircase CHARM analysis

    Cornsweet (1962) Acree et al. (1986)

    Yes/No Yes/No

    Geometric mean of inpidual threshold Lowest correct set with confirmation at next concentration t-test for significant difference from zero with all blank trials (blind control scores) subtracted Average of responses ascending after one incorrect, descending after two correct at each level Average of reversal points Nonresponse on descending concentration runs (implied)

    threshold. These correlations are strong when comparing inpiduals of very different sensitivities, but the threshold-suprathreshold parallel may not extend to all flavor compounds. A more complete understanding of the whole dynamic range of dose-response, as found in scaling studies, would be more informative. Other shortcomings need to be kept in mind by sensory workers who would use threshold measurements for making product decisions. First, thresholds are statistical constructs only. Signal detection theory warns us that the signal and noise perge in a continuous fashion and discontinuities in perception may be an idealized construction that is comfortable but unrealistic. There is no sudden transition from nondetection to 100% detection. Any modern concept of threshold must take into account that a range of values, rather than a single point is involved in the specification. Thresholds depend upon the conditions of measurement. For example, as the purity of the diluent increases, the threshold for taste will go down. So a measure of the true absolute threshold for taste (if such a thing existed) would require water of infinite purity. The threshold exists not in this abstract sense, but only

    as a potentially useful construct of our methods and informational requirements. Finally, because of the problems mentioned above, sensory professionals need to keep the following principles firmly in mind when working with threshold procedures: First, changes in method will change the obtained values. Literature values cannot be trusted to extend to a new product or a new medium or if changes are made in test procedure. Next, threshold distributions do not always follow a normal bell curve. There are often high outliers and possibly cases of insensitivity due to inherited deficits like specific anosmia (Amoore, 1971; Brown et al., 1978; Lawless et al., 1994). Threshold values for an inpidual are prone to high variability and low test-retest reliability. An inpidual’s threshold measure on a given day is not necessarily a stable characteristic of that person (Lawless et al., 1995; Stevens et al., 1988). Practice effects can be profound, and thresholds may stabilize over a period of time (Engen, 1960; Rabin and Cain, 1986). However, group averaged thresholds are reliable (Brown et al., 1978; Punter, 1983) and provide a useful index of the biological activity of a stimulus.

    Appendix: MTBE Threshold Data for Worked Example

    Panelist 1 2 3 4 5 6

    Concentration (μg/L) 2 3.5 + o o o o + o o + o o o

    6 + o o o + +

    10 + + o o o +

    18 + + o o + +

    30 + + o o + +

    60 + + + + o +

    100 + + + + + +

    BET 4.6 7.7 42 42 77 4.6

    log(BET) 0.663 0.886 1.623 1.623 1.886 0.663

    144

    Panelist 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 Prop. Corr.

    6 Measurement of Sensory Thresholds

    Concentration (μg/L) 2 3.5 6 o + + + + o o o + o o o + o + o o o + + + + + + o + + o o + + o + o o + + + + + o + + + + + + + + + o + + + o + + o + + + o o o o + o + o o + + + o o + + + + + + o o o o o o o o o + + + + o + + + o o o + + + + o + o o o o o + + o + + + + + o + o o + o o o o o o + o o o + + + + + + + o o o o + o o o o o + 0.44 0.49 0.61

    10 o + o o + o + + o o + + + + + + o + + + o + o o + + o o + + + + o o + + o o o + o + + + + + + o + o + 0.58

    18 + + + o + + + + + o + + + o + + + + o o + + + + o o + o o o + + + + + + o + + + o o o + + + + o + o o 0.65

    30 + + + o + o + + + o + + + + + + + + + + o + + + o o o o + + + + + + + + o o + + + + + + + + + + + + + 0.77

    60 + + + o + + + + + + + + + o + + + + + + o + + + + o + + + + + + + + + + + + + o + + + + + + + + + + o 0.89

    100 + + + + + o + + + o + + + o + + + + + + + o + + o + + o o + + + + + + + + + + + o + + + + + + + + + + 0.86

    BET 13 7.7 13 77 4.6 132 1.4 1.4 13 132 4.6 4.6 1.4 132 1.4 1.4 13 1.4 23 23 77 132 13 13 132 77 42 132 132 23 1.4 1.4 13 13 1.4 7.7 42 42 13 77 132 23 23 4.6 7.7 1.4 1.4 23 4.6 23 77 Mean (log(BET)) 10ˆ1.154=

    log(BET) 1.114 0.886 1.114 1.886 0.663 2.121 0.146 0.146 1.114 2.121 0.663 0.663 0.146 2.121 0.146 0.146 1.114 0.146 1.362 1.362 1.886 2.121 1.114 1.114 2.121 1.886 1.623 2.121 2.121 1.362 0.146 0.146 1.114 1.114 0.146 0.886 1.623 1.623 1.114 1.886 2.121 1.362 1.362 0.663 0.886 0.146 0.146 1.362 0.663 1.362 1.886 1.154 14.24

    References

    References Acree, T. E. 1993. Bioassays for flavor. In: T. E. Acree and R. Teranishi (eds.), Flavor Science, Sensible Principles and Techniques. American Chemical Society Books, Washington, pp. 1-20. Acree, T. E., Barnard, J. and Cunningham, D. G. 1984. A procedure for the sensory analysis for gas chromatographic effluents. Food Chemistry, 14, 273-286. American Spice Trade Association 1968. Pungency of capsicum spices and oleoresins. American Spice Trade Association Official Analytical Methods, 21.0, 43-47. Amoore, J. E. 1971. Olfactory genetics and anosmia. In: L. M. Beidler (ed.), Handbook of Sensory Physiology. Springer, Berlin, pp. 245-256. Amoore, J. E. 1979. Directions for pparing aqueous solutions of primary odorants to diagnose eight types of specific anosmia. Chemical Senses and Flavor, 4, 153-161. Amoore, J. E.,Venstrom, D. and Davis, A. R. 1968. Measurement of specific anosmia. Perceptual and Motor Skills, 26, 143-164. Antinone, M. A., Lawless, H. T., Ledford, R. A. and Johnston, M. 1994. The importance of diacetyl as a flavor component in full fat cottage cheese. Journal of Food Science, 59, 38-42. ASTM. 1978. Compilation of Odor and Taste Threshold Values Data. American Society for Testing and Materials, Philadelphia. ASTM. 2008a. Standard practice for determining odor and taste thresholds by a forced-choice ascending concentration series method of limits, E-679-04. Annual Book of Standards, Vol. 15.08. ASTM International, Conshocken, PA, pp. 36-42. ASTM. 2008b. Standard practice for defining and calculating inpidual and group sensory thresholds from forcedchoice data sets of intermediate size, E-1432-04. ASTM International Book of Standards, Vol. 15.08. ASTM International, Conshocken, PA, pp.82-89. ASTM. 2008c. Standard test method for sensory evaluation of red pepper heat, E-1083-00. ASTM International Book of Standards, Vol. 15.08. ASTM International, Conshocken, PA, pp. 49-53. Ayya, N. and Lawless, H. T. 1992. Qualitative and quantitative evaluation of high-intensity sweeteners and sweetener mixtures. Chemical Senses, 17, 245-259. Bartoshuk, L. M. 1987. Psychophysics of taste. American Journal of Clinical Nutrition, 31, 1068-1077. Bartoshuk, L. M., Murphy, C. and Cleveland, C. T. 1978. Sweet taste of dilute NaCl: Psychophysical evidence for a sweet stimulus. Physiology and Behavior, 21, 609-613. Berglund, B. and Högman, L. 1992. Reliability and validity of odor measurements near the detection threshold. Chemical Senses, 17, 823-824. Blakeslee, A. F. 1932. Genetics of sensory thresholds: Taste for phenylthiocarbamide. Proceedings of the National Academy of Sciences, 18, 120-130. Brennand, C. P., Ha, J. K. and Lindsay, R. C. 1989. Aroma properties and thresholds of some branched chain and other minor volatile fatty acids occurring in milkfat and meat lipids. Journal of Sensory Studies, 4, 105-120.

    145 Boring, E. G. 1942. Sensation and Perception in the History of Experimental Psychology. Appleton-CenturyCrofts, New York, pp. 41-42. Brown, D. G. W., Clapperton, J. F., Meilgaard, M. C. and Moll, M. 1978. Flavor thresholds of added substances. American Society of Brewing Chemists Journal, 36, 73-80. Bufe, B., Breslin, P. A. S., Kuhn, C., Reed, D., Tharp, C. D., Slack, J. P., Kim, U.-K., Drayna, D. and Meyerhof, W. 2005. The molecular basis of inpidual differences in phenylthiocarbamide and propylthiouracil bitterness perception. Current Biology, 15, 322-327. Cain, W. S. 1976. Olfaction and the common chemical sense: Some psychophysical contrasts. Sensory Processes, 1, 57-67. Cain, W. S. and Murphy, C. L. 1980. Interaction between chemoreceptive modalities of odor and irritation. Nature, 284, 255-257. Cornsweet, T. M. 1962. The staircase method in psychophysics. American Journal of Psychology, 75, 485-491. Cunningham, D. G., Acree, T. E., Barnard, J., Butts, R. M. and Braell, P. A. 1986. CHARM analysis of apple volatiles. Food Chemistry, 19, 137-147. Dale, M. S., Moylan, M. S., Koch, B. and Davis, M. K. 1997. Taste and odor threshold determinations using the flavor profile method. Proceedings of the American Water Works Association Water-Quality Technology Conference, Denver, CO. Dalton, P., Doolittle, N. and Breslin, P. A. S. 2002. Genderspecific induction of enhanced sensitivity to odors. Nature Neuroscience, 5, 199-200. Day, E. A., Lillard, D. A. and Montgomery, M. W. 1963. Autooxidation of milk chúng tôi Effect on flavor of the additive interactions of carbonyl compounds at subthreshold concentrations. Journal of Dairy Science, 46, 291-294. Diamond, J., Dalton, P., Doolittle, N. and Breslin, P. A. S. 2005. Gender-specific olfactory sensitization: Hormonal and cognitive influences. Chemical Senses, 30(suppl 1), i224-i225. Dravnieks, A. and Prokop, W. H. 1975. Source emission odor measurement by a dynamic forced-choice triangle olfactometer. Journal of the Air Pollution Control Association, 25, 28-35. Engen, T. E. 1960. Effects of practice and instruction on olfactory thresholds. Perceptual and Motor Skills, 10, 195-198. Finney, D. J. 1971. Probit Analysis, Third Edition. Cambridge University, London. Friedrich, J. E. and Acree, T. E. 2000. Design of a standard set of odorants to test inpiduals for specific anosmia. Frontiers in Flavour Science [Proc. 9th Weurman Flavour Res. Symp.], 230-234. Frijters, J. E. R. 1978. A critical analysis of the odour unit number and its use. Chemical Senses and Flavour, 3, 227-233. Gillette, M., Appel, C. E. and Lego, M. C. 1984. A new method for sensory evaluation of red pepper heat. Journal of Food Science, 49, 1028-1033. Green, B. G. 1989. Capsaicin sensitization and desensitization on the tongue produced by brief exposures to a low concentration. Neuroscience Letters, 107, 173-178. Govindarajan, V. S. 1987. Capsicum – production, technology, chemistry and quality. Part III. Chemistry of the color, aroma and pungency stimuli. CRC Critical Reviews in Food Science and Nutrition, 24, 245-311.

    146 Guth, H. and Grosch, W. 1994. Identification of the character impact odorants of stewed beef juice by instrumental analysis and sensory studies. Journal of Agricultural and Food Chemistry, 42, 2862-2866. Harris, H. and Kalmus, H. 1949. The measurement of taste sensitivity to phenylthiourea (P.T.C.). Annals of Eugenics, 15, 24-31. Harvey, L. O. 1986. Efficient estimation of sensory thresholds. Behavior Research Methods, Instruments and Computers, 18, 623-632. James, W. 1913. Psychology. Henry Holt and Co., New York. Jesteadt, W. 1980. An adaptive procedure for subjective judgments. Perception and Psychophysics, 28(1), 85-88. Kittel, K. M and Acree, T. E. 2008. Investigation of olfactory deficits using gas-chromatography olfactometry. Manuscript submitted, available from the authors. Lawless, H. T. 1980. A comparison of different methods used to assess sensitivity to the taste of phenylthiocarbamide PTC. Chemical Senses, 5, 247-256. Lawless, H. T. 2010. A simple alternative analysis for threshold data determined by ascending forced-choice method of limits. Journal of Sensory Studies, 25, 332-346. Lawless, H. T., Antinone, M. J., Ledford, R. A. and Johnston, M. 1994. Olfactory responsiveness to diacetyl. Journal of Sensory Studies, 9(1), 47-56. Lawless, H. T., Rozin, P. and Shenker, J. 1985. Effects of oral capsaicin on gustatory, olfactory and irritant sensations and flavor identification in humans who regularly or rarely consumer chili pepper. Chemical Senses, 10, 579-589. Lawless, H. T., Thomas, C. J. C. and Johnston, M. 1995. Variation in odor thresholds for l-carvone and cineole and correlations with suprathreshold intensity ratings. Chemical Senses, 20, 9-17. Linschoten, M. R., Harvey, L. O., Eller, P. A. and Jafek, B. W. 1996. Rapid and accurate measurement of taste and smell thresholds using an adaptive maximum-likelihood staircase procedure. Chemical Senses, 21, 633-634. Lundahl, D. S., Lukes, B. K., McDaniel, M. R. and Henderson, L. A. 1986. A semi-ascending paired difference method for determining the threshold of added substances to background media. Journal of Sensory Studies, 1, 291-306. Masuoka, S., Hatjopolous, D. and O’Mahony, M. 1995. Beer bitterness detection: Testing Thurstonian and sequential sensitivity analysis models for triad and tetrad methods. Journal of Sensory Studies, 10, 295-306. Marin, A. B., Acree, T. E. and Barnard, J. 1988. Variation in odor detection thresholds determined by charm analysis. Chemical Senses, 13, 435-444. Marin, A. B., Barnard, J., Darlington, R. B. and Acree, T. E. 1991. Sensory thresholds: Estimation from dose-response curves. Journal of Sensory Studies, 6(4), 205-225. McBurney, D. H. and Collings, V. B. 1977. Introduction to Sensation and Perception. Prentice-Hall, Englewood Cliffs, NJ. Meilgaard, M., Civille, G. V. and Carr, B. T. 1991. Sensory Evaluation Techniques, Second Edition. CRC Press, Boca Raton, FL. Mela, D. 1989. Bitter taste intensity: The effect of tastant and thiourea taster status. Chemical Senses, 14, 131-135. Milo, C. and Grosch, W. 1993. Changes in the odorants of boiled trout (Salmo fario) as affected by the storage of the raw

    6 Measurement of Sensory Thresholds material. Journal of Agricultural and Food Chemistry, 41, 2076-2081. Mojet, J., Christ-Hazelhof, E. and Heidema, J. 2001. Taste perception with age: Generic or specific losses in threshold sensitivity to the five basic tastes. Chemical Senses 26, 854-860. Morrison, D. G. 1978. A probability model for forced binary choices. American Statistician, 32, 23-25. Murphy, C., Quiñonez, C. and Nordin, S. 1995. Reliability and validity of electrogustometry and its application to young and elderly persons. Chemical Senses, 20(5), 499-515. O’Mahony, M. and Ishii, R. 1986. Umami taste concept: Implications for the dogma of four basic tastes. In: Y. Kawamura and M. R. Kare (eds.), Umami: A Basic Taste. Marcel Dekker, New York, pp. 75-93.. O’Mahony, M. and Odbert, N. 1985. A comparison of sensory difference testing procedures: Sequential sensitivity analysis and aspects of taste adaptation. Journal of Food Science, 50, 1055-1060. Pangborn, R. M. 1981. A critical review of threshold, intensity and descriptive analyses in flavor research. In: Flavor ’81. Walter de Gruyter, Berlin, pp. 3-32. Piggott, J. R. 1990. Relating sensory and chemical data to understand flavor. Journal of Sensory Studies, 4, 261-272. Prescott, J., Norris, L., Kunst, M. and Kim, S. 2005. Estimating a “consumer rejection threshold” for cork taint in white wine. Food Quality and Preference, 18, 345-349. Punter, P. H. 1983. Measurement of human olfactory thresholds for several groups of structurally related compounds. Chemical Senses, 7, 215-235. Rabin, M. D. and Cain, W. S. 1986. Determinants of measured olfactory sensitivity. Perception and Psychophysics, 39(4), 281-286. Reed, D. R., Bartoshuk, L. M., Duffy, V., Marino, S. and Price, R. A. 1995. Propylthiouracil tasting: Determination of underlying threshold distributions using maximum likelihood. Chemical Senses, 20, 529-533. Saliba, A. J., Bullock, J. and Hardie, W. J. 2009. Consumer rejection threshold for 1,8 cineole (eucalyptol) in Australian red wine. Food Qualithy and Preference, 20, 500-504. Schieberle, P. and Grosch, W. 1988. Identification of potent flavor compounds formed in an aqueous lemon oil/citric acid emulsion. Journal of Agricultural and Food Chemistry, 36, 797-800. Stevens, D. A. and O’Connell, R. J. 1991. Inpidual differences in threshold and quality reports of subjects to various odors. Chemical Senses, 16, 57-67. Stevens, D. A. and O’Connell, R. J. 1995. Enhanced sensitivity to adrostenone following regular exposure to pemenone. Chemical Senses, 20, 413-419. Stevens, J. C., Cain, W. S. and Burke, R. J. 1988. Variability of olfactory thresholds. Chemical Senses, 13, 643-653. Stocking, A. J., Suffet, I. H., McGuire, M. J. and Kavanaugh, M. C. 2001. Implications of an MTBE odor study for setting drinking water standards. Journal AWWA, March 2001, 95-105. Todd, P. H., Bensinger, M. G. and Biftu, T. 1977. Determination of pungency due to capsicum by gas-liquid chromatography. Journal of Food Science, 42, 660-665. Tuorila, H., Kurkela, R., Suihko, M. and Suhonen. 1981. The influence of tests and panelists on odour detection

    References thresholds. Lebensmittel Wissenschaft und Technologie, 15, 97-101. USEPA (U.S. Environmental Protection Agency). 2001. Statistical analysis of MTBE odor detection thresholds in drinking water. National Service Center for Environmental Publications (NSCEP) # 815R01024, available from http://nepis.epa.gov. Van Gemert, L. 2003. Odour Thresholds. Oliemans, Punter & Partners, Utrecht, The Netherlands. Viswanathan, S., Mathur, G. P., Gnyp, A. W. and St. Peirre, C. C. 1983. Application of probability models to threshold determination. Atmospheric Environment, 17, 139-143.

    147 Walker, J. C., Hall, S. B., Walker, D. B., Kendall-Reed, M. S., Hood, A. F. and Nio, X.-F. 2003. Human odor detectability: New methodology used to determine threshold and variation. Chemical Senses, 28, 817-826. Wetherill, G. B. and Levitt, H. 1965. Sequential estimation of points on a psychometric function. British Journal of Mathematical and Statistical Psychology, 18(1), 1-10. Wysocki, C. J., Dorries, K. M. and Beauchamp, G. K. 1989. Ability to perceive androstenone can be acquired by ostensibly anosmic people. Proceedings of the National Academy of Science, USA, 86, 7976-7989.

    Chapter 7

    Scaling

    Abstract Scaling describes the application of numbers, or judgments that are converted to numerical values, to describe the perceived intensity of a sensory experience or the degree of liking or disliking for some experience or product. Scaling forms the basis for the sensory method of descriptive analysis. A variety of methods have been used for this purpose and with some caution, all work well in differentiating products. This chapter discusses theoretical issues as well as practical considerations in scaling. The vital importance of knowing the properties and limitations of a measuring instrument can hardly be denied by most natural scientists. However, the use of many different scales for sensory measurement is common within food science; but very few of these have ever been validated. . . . -(Land and Shepard, 1984, pp. 144-145)

    Contents 7.1 7.2 7.3

    7.4

    7.5

    7.6

    Introduction . . . . . . . . . . . . . . . . . Some Theory . . . . . . . . . . . . . . . . Common Methods of Scaling . . . . . . . . . 7.3.1 Category Scales . . . . . . . . . . . . 7.3.2 Line Scaling . . . . . . . . . . . . . 7.3.3 Magnitude Estimation . . . . . . . . . Recommended Practice and Practical Guidelines . . . . . . . . . . . . . . . . . . 7.4.1 Rule 1: Provide Sufficient Alternatives . . . . . . . . . . . . . . 7.4.2 Rule 2: The Attribute Must Be Understood . . . . . . . . . . . . 7.4.3 Rule 3: The Anchor Words Should Make Sense . . . . . . . . . . . . . . . . . 7.4.4 To Calibrate or Not to Calibrate . . . . . 7.4.5 A Warning: Grading and Scoring are Not Scaling . . . . . . . . . . . . . . . . Variations-Other Scaling Techniques . . . . 7.5.1 Cross-Modal Matches and Variations on Magnitude Estimation . . . . . . . . . 7.5.2 Category-Ratio (Labeled Magnitude) Scales . . . . . . . . . . . . . . . . 7.5.3 Adjustable Rating Techniques: Relative Scaling . . . . . . . . . . . . . . . . 7.5.4 Ranking . . . . . . . . . . . . . . . 7.5.5 Indirect Scales . . . . . . . . . . . . Comparing Methods: What is a Good Scale? . . . . . . . . . . . . . . . . . . . .

    Issues . . . . . . . . . . . . . . . . . . . . 7.7.1 “Do People Make Relative Judgments” Should They See Their Previous Ratings? . . . . . . . . . . . . . . . 7.7.2 Should Category Rating Scales Be Assigned Integer Numbers in Data Tabulation? Are They Interval Scales? . . . . . . . . . 7.7.3 Is Magnitude Estimation a Ratio Scale or Simply a Scale with Ratio Instructions? . . . . . . . . . . . . . 7.7.4 What is a “Valid” Scale? . . . . . . . . 7.8 Conclusions . . . . . . . . . . . . . . . . . Appendix 1: Derivation of Thurstonian-Scale Values for the 9-Point Scale . . . . . . . . . . . . . Appendix 2: Construction of Labeled Magnitude Scales . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . 7.7

    149 151 152 152 155 156 158 159 159 159 159 160 160 160 162 164 165 166 167

    168

    168

    169

    169 169 170 171 172 174

    7.1 Introduction People make changes in their behavior all the time based on sensory experience and very often this involves a judgment of how strong or weak something feels. We add more sugar to our coffee if it is not sweet enough. We adjust the thermostat in our home if it is too cold or too hot. If a closet is too dark to find your shoes you turn the light on. We apply more force to

    H.T. Lawless, H. Heymann, Sensory Evaluation of Food, Food Science Text Series, DOI 10.1007/978-1-4419-6488-5_7, © Springer Science+Business Media, LLC 2010

    149

    150

    chew a tough piece of meat if it will not disintegrate to allow swallowing. These behavioral decisions seem automatic and do not require a numerical response. But the same kinds of experiences can be evaluated with a response that indicates the strength of the sensation. What was subjective and private becomes public data. The data are quantitative. This is the basis of scaling. The methods of scaling involve the application of numbers to quantify sensory experiences. It is through this process of numerification that sensory evaluation becomes a quantitative science subject to statistical analysis, modeling, pdiction, and hard theory. However, as noted in the quote above, in the practical application of sensory test methods, the nature of this process of assigning numbers to experiences is rarely questioned and deserves scrutiny. Clearly numbers can be assigned to sensations by a panelist in a variety of ways, some by mere categorization, or by ranking or in ways that attempt to reflect the intensity of sensory experience. This chapter will illustrate these techniques and discuss the arguments that have been raised to substantiate the use of different quantification procedures. Scaling involves sensing a product or stimulus and then generating a response that reflects how the person perceives the intensity or strength of one or more of the sensations generated by that product. This process is based on a psychophysical model (see Chapter 2). The psychophysical model states that as the physical strength of the stimulus increases (e.g., the energy of a light or sound or the concentration of a chemical stimulus) the sensation will increase in some orderly way. Furthermore, panelists are capable of generating different responses to indicate these changes in what they experience. Thus a systematic relationship can be modeled of how physical changes in the real world result in changing sensations. Scaling is a tool used for showing differences and degrees of difference among products. These differences are usually above the threshold level or justnoticeable difference. If the products are very similar and there is a question of whether there is any difference at all, the discrimination testing methods are more suitable (Chambers and Wolf, 1996). Scaling is usually done in one of the two scenarios. In the first, untrained observers are asked to give responses to reflect changes in intensity and it is psumed that (1) they understand the attribute they are asked to scale, e.g., sweetness and (2) there is no need to train or calibrate them to use the scale. This is the kind of scaling done to study a

    7

    Scaling

    Fig. 7.1 The two processes involved in scaling. The first, physiological process is the psychophysical translation of energy in the outside world into sensation, i.e., conscious experience. The second is the translation of that experience into some response. The psychophysical process can be modified by physiological processes such as adaptation and masking. The judgment function can be modified by cognitive processes such as contextual effects, number usage, and other response biases.

    7.2 Some Theory

    7.2 Some Theory Measurement theory tells us that numbers can be assigned to items in different ways. This distinction was popularized by S. S. Stevens, the major proponent of magnitude estimation (1951). At least four ways of assigning numbers to events exist in common usage. These are referred to as nominal scaling, ordinal scaling, interval scaling, and ratio scaling. In nominal scaling, numbers are assigned to events merely as labels. Thus gender may be coded as a “dummy variable” in statistical analysis by assigning a zero to males and a one to females; no assumption is made that these numbers reflect any ordered property

    151

    of the sexes. They merely serve as convenient labels. The meals at which a food might be eaten could be coded with numbers as categories-one for breakfast, two for lunch, three for supper, and four for snacks. The assignment of a number for analysis is merely a label, a category or pigeonhole. The appropriate analysis of such data is to make frequency counts. The mode, the most frequent response, is used as a summary statistic for nominal data. Different frequencies of response for different products or circumstances can be compared by chi-square analysis or other nonparametric statistical methods (Siegel, 1956; see Appendix B). The only valid comparisons between inpidual items with this scale is to say whether they belong to the same category or to different ones (an equal versus not equal decision). In ordinal scaling, numbers are assigned to recognize the rank order of products with regard to some sensory property, attitude, or opinion (such as pference). In this case increasing numbers assigned to the products repsent increasing amounts or intensities of sensory experience. So a number of wines might be rank ordered for perceived sweetness or a number of fragrances rank ordered from most pferred to least pferred. In this case the numbers do not tell us anything about the relative differences among the products. We cannot draw conclusions about the degree of difference perceived nor the ratio or magnitude of difference. In an analogy to the order of runners finishing in a race, we know who placed first, second, third, etc. But this order does indicate neither the finishing distances between contestants nor the differences in their elapsed times. In general, analyses of ranked data can report medians as the summary statistic for central tendency or other percentiles to give added information. As with nominal data, nonparametric statistical analyses (see Appendix B) are appropriate when ranking is done (Siegel, 1956). The next level of scaling occurs when the subjective spacing of responses is equal, so the numbers repsent equal degrees of difference. This is called interval-level measurement. Examples in the physical sciences would be the centigrade and Fahrenheit scales of temperature. These scales have arbitrary zero points but equal pisions between values. The scales are inter-convertible through a linear transformation, for example, ◦ C = 5/9 (◦ F-32). Few scales used in sensory science have been subjected to tests that would help establish whether they achieved an interval level of measurement and yet this level is often assumed.

    152

    7

    Scaling

    instructed to assign numbers in relative proportions that reflect the strength of their sensations (Stevens, 1956). However, ratio instructions are easy to give, but whether the scale has ratio properties in reflecting a person’s actual subjective experiences is difficult to determine, if not impossible. Because of these different measurement types with different properties, the sensory professional must be careful about two things. First, statements about differences or ratios in comparing the scores for two products should not be made when the measurement level is only nominal or ordinal. Second, it is risky to use parametric statistics for measurements that reflect only frequency counts or rankings (Gaito, 1980; Townsend and Ashby, 1980). Nonparametric methods are available for statistical analyses of such data.

    7.3 Common Methods of Scaling Several different scaling methods have been used to apply numbers to sensory experience. Some, like magnitude estimation, are adapted from psychophysical research, and others, like category scaling have become popular through practical application and dissemination in a wide variety of situations. This section illustrates the common techniques of category scales, line marking, and magnitude estimation. The next section discusses the less frequently used techniques of hybrid category-ratio scales, indirect scales, and ranking as alternatives. Two other methods are illustrated. Intensity matching across sensory modalities, called cross-modality matching, was an important psychophysical technique and a pcedent to some of the category-ratio scales. Finally, adjustable rating techniques in which panelists make relative placements and are able to alter their ratings are also discussed.

    7.3.1 Category Scales Perhaps the oldest method of scaling involves the choice of discrete response alternatives to signify increasing sensation intensity or degrees of liking and/or pference. The alternatives may be psented in a horizontal or vertical line and may offer choices of integer numbers, simple check boxes, or word phrases. Examples of simple category scales are shown in

    7.3 Common Methods of Scaling

    153

    A) INTENSITY 1

    2

    3

    4

    5

    6

    7

    8

    9 Strong

    Weak

    B) Oxidized

    not noticeable trace, not sure faint slight mild moderate definite strong very strong

    D) Sweetness

    Extremely sweet

    Not sweet at all

    E) Sweetness

    R

    Weaker

    Stronger

    F) Hedonic scale for children

    Super Good

    Really Good

    Good

    Maybe Good or Maybe Bad

    Bad

    Really Bad

    Super Bad

    Fig. 7.2 Examples of category scales. (a) a simple integer scale for sensation strength (after Lawless and Malone, 1986b); (b) a verbal scale for degree of oxidized flavor (after Mecredy et al., 1974; (c) a verbal scale for degree of difference from some reference or control sample (after Aust et al., 1985), (d) a simple

    check-box scale for perceived intensity; (e) a simple check-box scale for difference in intensity from some reference sample, marked R (after Stoer and Lawless, 1993); (f) a facial scale suitable for use with children, after Chen et al. (1996).

    Fig. 7.2. The job of the consumer or panelist is to choose the alternative that best repsents their reaction or sensation. In a category scale the number of alternative responses is limited. Seven to 15 categories are commonly used for intensity scaling depending upon the application and the number of gradations that the panelists are able to distinguish in the products. As panel training progresses, perceptual discrimination

    of intensity levels will often improve and more scale points may be added to allow the panel to make finer distinctions. A key idea is to psent an easily understandable word like “sweetness” and ask the participant to evaluate the perceived intensity of that attribute. A second important factor concerns the verbal labels that appear along the alternatives. At the very least, the low and high ends of the scale must be labeled

    154

    with words that make sense, e.g., “not sweet at all” to “extremely sweet.” A wide variety of these scales have been used. A common version is to allow integer numerical responses of approximately nine points (e.g., Lawless and Malone, 1986a, b). Further gradations may be allowed. For example, Winakor et al. (1980) allowed options from 1 to 99 in rating attributes of fabric handfeel. In the Spectrum method (Meilgaard et al., 2006) a 15-point category scale is used, but allows intermediate points in tenths, rendering it (at least in theory) a 150point scale. In hedonic or affective testing, a bipolar scale is common, with a zero or neutral point of opinion at the center (Peryam and Girardot, 1952). These are often shorter than the intensity scales. For example, in the “smiling face” scale used with children, only three options may be used for very young respondents (Birch et al., 1980, 1982), although with older children as many as nine points may be used (Chen et al., 1996; Kroll, 1990). Lately there has been a move away from using labels or integers, in case these may be biasing to subjects. People seem to have favorite numbers or tendencies to use some numbers more than others (e.g., Giovanni and Pangborn, 1983). A solution to this problem is to use an unlabeled check-box scale as shown in Fig. 7.2. In early applications of category scaling, the procedure specifically instructed subjects to use the categories to repsent equal spacing. They might also be instructed to distribute their judgments over the available scale range, so the strongest stimulus was rated at the highest category and the weakest stimulus at the lowest category. This use of such explicit instructions surfaces from time to time. An example is in Anderson’s (1974) recommendation to show the subject specific examples of bracketing stimuli that are above and below the anticipated range of items in the set to be judged. A related method is the relative scaling procedure of Gay and Mead (1992) in which subjects place the highest and lowest stimuli at the scale endpoints (discussed below). The fact that there is an upper boundary to the allowable numbers in a category scaling task may facilitate the achievement of a linear interval scale (Banks and Coleman, 1981). A related issue concerns what kind of experience the high end anchor refers to. Muñoz and Civille (1998) pointed out that for descriptive analysis panels, the high end-anchor phrase could refer to different situations. For example, is the term “extremely sweet”

    7

    Scaling

    7.3 Common Methods of Scaling

    7.3.2 Line Scaling

    155

    a) very poor

    excellent

    AROMA

    b) moderate

    weak

    strong

    Pepper Heat

    c)

    O threshold

    slight

    moderate

    strong

    Sweetness

    d) weaker

    reference

    stronger

    overall opinion

    e) dislike moderately

    f)

    X

    least liked

    neither

    like moderately

    X most liked

    Fig. 7.3 Examples of line-marking scales: (a) with endpoints labeled (after Baten, 1946); (b) with indented “goal posts” (after Mecredy et al., 1974); (c) with additional points labeled as in ASTM procedure E-1083 (ASTM, 2008b); (d) a line for ratings relative to a reference as in Stoer and Lawless (1993); (e) hedonic scaling using a line; (f) the adjustable scale of Gay and Mead (1992) as pictured by Villanueva and D Silva (2009).

    An important historical trend to use lines in descriptive analysis was instrumental in the popularization of line marking. Stone et al. (1974) recommended the use of line marking for Quantitative Descriptive Analysis (QDA), then a relatively new approach to specifying the intensities of all important sensory attributes. It was important to have a scaling method in QDA that approximated an interval scale, as analysis of variance was to become the standard statistical technique for comparing products in descriptive analysis. The justification for the application in QDA appears to rest on the pvious findings of Baten regarding the sensitivity of the method and the writings of Norman Anderson on functional measurement theory (Anderson, 1974). In his approach, Anderson used indented end anchors (see Weiss, 1972, for another example). Anderson also showed his subjects examples of the high and

    156

    7

    Scaling

    with a cursor that moves up and down via the computer mouse. Time-intensity methods are reviewed more fully in Chapter 8.

    7.3.3 Magnitude Estimation 7.3.3.1 The Basic Techniques A popular technique for scaling in psychophysical studies has been the method of magnitude estimation. In this procedure, the respondent is instructed to assign numbers to sensations in proportion to how strong the sensation seems. Specifically, the ratios between the numbers are supposed to reflect the ratios of sensation magnitudes that have been experienced. For example, if product A is given the value of 20 for sweetness intensity and product B seems twice as sweet, B is given a magnitude estimate of 40. The two critical parts of the technique are the instructions given to the participant and the techniques for data analysis. Two primary variations of magnitude estimation have been used. In one method, a standard stimulus is given to the subject as a reference and that standard is assigned a fixed value such as 10. All subsequent stimuli are rated relative to this standard, sometimes called a “modulus.” It is often easier for panelists if the reference (i.e., the item used as the modulus) is chosen from somewhere near the middle of the intensity range. In the other variation of magnitude estimation, no standard stimulus is given and the participant is free to choose any number he or she wishes for the first sample. All samples are then rated relative to this first intensity, although in practice people probably “chain” their ratings to the most recent items in the series. Because people can choose different ranges of numbers in this “non-modulus” magnitude estimation, the data have to be treated to bring all judgments into the same range, an extra step in the analysis. Variations on magnitude estimation and guidelines for data analysis are found in ASTM Standard Test Method E 1697-05 (ASTM, 2008a). In the psychophysical laboratory, where magnitude estimation has found its primary usage, generally only one attribute is rated at a time. However, rating multiple attributes or profiling has been used in taste studies (McBurney and Bartoshuk, 1973; McBurney and Shick, 1971; McBurney et al., 1972) and this can

    7.3 Common Methods of Scaling

    naturally be extended to foods with multiple taste and aromatic attributes. Magnitude estimation has not been used very often for descriptive analysis, but in principle there is no reason why it could not be used for that purpose. Participants should be cautioned to avoid falling into pvious habits of using bounded category scales, e.g., limited ranges of numbers from zero to ten. This may be a difficult problem with pviously trained panels that have used a different scaling method, as people like to stick with a method they know and feel comfortable with. Panelists who show such behavior may not understand the ratio nature of the instructions. It is sometimes useful with a new panelist to have the participant perform a warm-up task to make sure they understand the scaling instructions. The warm-up task can involve estimation of the size or area of different geometric ps (Meilgaard et al., 2006) or the length of lines (McBurney et al., 1972). Sometimes it is desired to have panelists rate multiple attributes at the same time or to break down overall intensity into specific qualities. If this kind of “profiling” is needed, the geometric ps can include areas with different shading or the lines can be differently colored. A practice task is highly recommended so that the sensory scientist can check on whether the participant understands the task. Values of zero are allowed in this method as some of the products may in fact have or no sensation for a given attribute (like no sweetness in our example). Of course, the rating of zero should not be used for the reference material. While the value of zero is consistent with common sense for products with no sensation of some attributes, it does complicate the data analysis as discussed below. 7.3.3.2 Instructions The visual appearance of the ballot in magnitude estimation is not critical; it is the instructions and the participant’s comphension of the ratio nature of the judgments that are important. Some ballots even allow the subject/participant to view all pvious ratings. Here are sample instructions for the use of magnitude estimation with a reference sample or modulus with a fixed number assigned to it: Please taste the first sample and note its sweetness. This sample is given the value of “10” for its sweetness

    157 intensity. Please rate all other samples in proportion to this reference. For example, if the next sample is twice as sweet, assign it a value of “20”, if half as sweet, assign it a value of “5” and if 3.5 times as sweet, assign in a value of 35. In other words, rate the sweetness intensity so that your numbers repsent the ratios among the intensities of sweetness. You may use any positive numbers including fractions and decimals.

    The other major variation on this method uses no reference. In this case the instructions may read as follows: Please taste the first sample and note its sweetness. Please rate all other samples relative to this reference, applying numbers to the samples to repsent the ratios of sweetness intensity among the samples. For example, if the next sample was twice as sweet, you would give it a number twice as big as the rating assigned to the first sample, if half as sweet, assign it a number half as big and if 3.5 times as sweet, assign it a number 3.5 times as big. You may use any positive numbers including fractions and decimals.

    7.3.3.3 Data Treatment In non-modulus methods, participants will generally choose some range of numbers they feel comfortable with. The ASTM procedure suggests having them pick a value between 30 and 100 for the first sample, and avoiding any number that seems small. If participants are allowed to choose their own number range, it becomes necessary to re-scale each inpidual’s data to bring them into a common range before statistical analysis (Lane et al., 1961). This will pvent subjects who choose very large numbers from having undue influence on measures of central tendency (means) and in statistical tests. This rescaling process has been referred to as “normalizing” (ASTM, 2008a) although it has nothing to do with the normal distribution or Z-scores. A common method for rescaling proceeds as follows: (1) Calculate the geometric mean of each inpidual’s ratings across their data set. (2) Calculate the geometric mean of the entire data set (of all subjects combined). (3) For each subject, construct a ratio of the grand geometric mean of the entire data set to each person’s geometric mean. The value of this ratio provides a post hoc inpidual rescaling factor for each subject. In place of the grand geometric mean,

    158

    any positive numerator may also be chosen in constructing this factor, e.g., a value of 100. (4) Multiply each data point for a given person by their inpidual rescaling factor. Do this for all participants using their own inpidual re-scaling factors. These re-scaled data are then analyzed. Note that due to the extra data treatment step in this method, it is simpler to use the modulus-based variation with a standard reference item. Magnitude estimation data are often transformed to logs before data analysis (Butler et al., 1987; Lawless, 1989). This is done primarily because the data tend to be positively skewed or log-normally distributed. There tends to be some high outlying values for any given sample. Perhaps this is not surprising because the scale is open-ended at the top, and bounded by zero at the bottom. Transformation into log data and/or taking geometric means psents some problems, however, when the data contain zeros. The log of zero is undefined. Any attempt to take a geometric mean by calculating the product of N items will yield a zero on multiplying. Several approaches have been taken to this problem. One is to assign a small positive value to any zeros in the data, perhaps one-half of the smallest rating given by a subject (ASTM, 2008a). The resulting analysis, however, will be influenced by this choice. Another approach is to use the median judgments in constructing the normalization factor for non-modulus methods. The median is less influenced by the high outliers in the data than the arithmetic mean.

    7.3.3.4 Applications For practical purposes, the method of magnitude estimation may be used with trained panels, consumers, and even children (Collins and Gescheider, 1989). However, the data do tend to be a bit more variable than other bounded scaling methods, especially in the hands of untrained consumers (Lawless and Malone, 1986b). The unbounded nature of the scale may make it especially well suited to sensory attributes where an upper boundary might impose restrictions on the panelists’ ability to differentiate very intense sensory experiences in their ratings. For example, irritative or painful sensations such as chili pepper intensity might all be rated near the upper bound of a category scale

    7

    Scaling

    of intensity, but an open-ended magnitude estimation procedure would allow panelists more freedom to differentiate and report variations among very intense sensations. With hedonic scaling of likes and dislikes, there is an additional decision in using magnitude estimation scaling. Two options have been adopted in using this technique, one employing a single continuum or unipolar scale for amount of liking and the other applying a bipolar scale with positive and negative numbers plus a neutral point (Pearce et al., 1986). In bipolar magnitude scaling of likes and dislikes, positive and negative numbers are allowed in order to signify ratios or proportions of both liking and disliking (e.g., Vickers, 1983). An alternative to positives and negatives is to have the respondent merely indicate whether the number repsents liking or disliking (Pearce et al., 1986). In unipolar magnitude estimation only positive numbers (and sometimes zeros) are allowed, with the lower end of the scale repsenting no liking and higher numbers given to repsent increasing proportions of liking (Giovanni and Pangborn, 1983; Moskowitz and Sidel, 1971). It is questionable whether a unipolar scale is a sensible response task for the participant, as it does not recognize the fact that a neutral hedonic response may occur, and that there are clearly two modes of reaction, one for liking and one for disliking. If one assumes that all items are on one side of the hedonic continuum-either all liked to varying degrees or all disliked to varying degrees then the one-sided scale makes sense. However, it is a rare situation with foods or consumer product testing in which at least some indifference or change of opinion was not visible in at least some respondents. So a bipolar scale fits common sense.

    7.4 Recommended Practice and Practical Guidelines Both line scales and category scales may be used effectively in sensory testing and consumer work. So we will not expend much effort in recommending one of these two common techniques over another. Some practical concerns are given below to help the student or practitioner avoid some potential problems. The category-ratio or labeled magnitude scales may facilitate comparisons among different groups, and this issue is discussed below in Section 7.5.2.

    7.4 Recommended Practice and Practical Guidelines

    7.4.1 Rule 1: Provide Sufficient Alternatives One major concern is to provide sufficient alternatives to repsent the distinctions that are possible by panelists (Cox, 1980). In other words, a simple 3-point scale may not suffice if the panel is highly trained and capable of distinguishing among many levels of the stimuli. This is illustrated in the case of the flavor profile scale, which began with five points to repsent no sensation, threshold sensation, weak, moderate, and strong (Caul, 1957). It was soon discovered that additional intermediate points were desirable for panelists, especially in the middle range of the scale where many products would be found. However, there is a law of diminishing returns in allowing too many scale points-further elaboration allows better differentiation of products up to a point and then the gains diminish as the additional response choices merely capture random error variation (Bendig and Hughes, 1953). A related concern is the tendency, especially in consumer work, to simplify the scale by eliminating options or truncating endpoints. This brings in the danger caused by end-use avoidance. Some respondents may be reluctant to use the end categories, just in case a stronger or weaker item may be psented later in the test. So there is some natural human tendency to avoid the end categories. Truncating a 9-point scale to a 7-point scale may leave the evaluator with what is functionally only a 5-point scale for all practical purposes. So it is best to avoid this tendency to truncate scales in experimental planning.

    7.4.2 Rule 2: The Attribute Must Be Understood Intensity ratings must be collected on an attribute which the participants understand and about which they have a general consensus and agreement as to its meaning. Terms like sweetness are almost universal but a term like “green aroma” might be interpted in different ways. In the case of a descriptive panel, a good deal of effort may be directed at using reference standards to illustrate what is meant by a specific term. In the case of consumer work, such training is not done, so if any intensity ratings are collected, they must be about simple terms about which people generally

    159

    agree. Bear in mind that most early psychophysics was done on simple attributes like the loudness of a sound or the heaviness of a weight. In the chemical senses, with their perse types of sensory qualities and fuzzy consumer vocabulary, this is not so straightforward. Other problems to avoid include mixing sensation intensity (strength) and hedonics (liking), except in the just-right scale where this is explicit. An example of a hedonically loaded sensory term is the adjective “fresh.” Whatever this means to consumers, it is a poor choice for a descriptive scale because it is both vague and connotes some degree of liking or goodness to most people. Vague terms are simply not actionable when it comes to giving feedback to product developers about what needs to be fixed. Another such vague term that is popular in consumer studies is “natural.” Even though consumers might be able to score products on some unknown basis using this word, the information is not useful as it does not tell formulators what to change if a product scores low. A similar problem arises with attempting to scale “overall quality.” Unless quality has been very carefully defined, it cannot be scaled.

    7.4.3 Rule 3: The Anchor Words Should Make Sense In setting up the scales for descriptive analysis or for a consumer test, the panel leader should carefully consider the nature of the verbal end anchors for each scale as well as any intermediate anchors that may be needed. Should the scale be anchored from “very weak” to “very strong” or will there be cases in which the sensory attribute is simply not psent? If so, it makes sense to verbally anchor the bottom of the scale with “not at all” or “none.” For example, a sweetness scale could be anchored with “not at all sweet” and a scale for “degree of oral irritation” could be anchored using “none.”

    160

    7.4.5 A Warning: Grading and Scoring are Not Scaling In some cases pseudo-numerical scales have been set up to resemble category scales, but the examples cut

    7

    Scaling

    across different sensory experiences, mixing qualities. An example is the pseudo-scale used for baked products, where the number 10 is assigned for perfect texture, 8 for slight dryness, 6 for gumminess, and 4 if very dry (AACC, 1986). Gumminess and dryness are two separate attributes and should be scaled as such. This is also an example of quality grading, which is not a true scaling procedure. When the numbers shift among different sensory qualities, this violates the psychophysical model for scaling the intensity of a single attribute. Although numbers may be applied to the grades, they cannot be treated statistically, as the average of “very dry ” (4) and slightly dry (8) is not gummy (6) (see Pangborn and Dunkley, 1964, for a critique of this in the dairy grading arena). The numbers in a quality-grading scheme do not repsent any kind of unitary psychophysical continuum.

    7.5 Variations-Other Scaling Techniques An important idea in scaling theory is the notion that people may have a general idea of how weak or strong sensations are, and that they can compare different attributes of a product for their relative strength, even across different sensory modalities. So, for example, someone could legitimately say that this product tastes much more salty than it is sweet. Or that the trumpets are twice as loud as the flutes in a certain passage in a symphony. Given that this notion is correct, people would seem to have a general internal scale for the strength of sensations. This idea forms the basis for several scaling methods. It permits the comparison of different sensations cross-referenced by their numerical ratings, and even can be used to compare word responses. Methods derived from this idea are discussed next.

    7.5.1 Cross-Modal Matches and Variations on Magnitude Estimation The method of magnitude estimation has a basis in earlier work such as fractionation methods and the method of sense ratios in the older literature (Boring,

    7.5 Variations-Other Scaling Techniques

    161

    1942) where people would be asked to set one stimulus in a given sensation ratio to another. The notion of allowing any numbers to be generated in response to stimuli, rather than adjusting stimuli to repsent fixed numbers appeared somewhat later (Moskowitz, 1971; Richardson and Ross, 1930; Stevens, 1956). An important outcome of these studies was the finding that the resulting psychophysical function generally conformed to a power law of the following form:

    brightness of the lights about in the same proportions as the loudness of the sounds. Stevens (1969) proposed that these experiments could validate the power law, since the exponents of the cross-modality matching function could be pdicted from the exponents derived from separate scaling experiments. Consider the following example: For one sensory attribute (using the log transform, Eq. (7.2)),

    R = kI n

    log R1 = n1 log I1 + log k1

    (7.1)

    and for a second sensory attribute

    or after log transformation: log(R) = n log(I) + log(k)

    (7.3)

    (7.2)

    where R was the response, e.g., perceived loudness (mean or geometric mean of the data) and I was the physical stimulus intensity, e.g., sound pssure, and k was a constant of proportionality that depends upon the units of measurement. The important characteristic value of any sensory system was the value for n, the exponent of the power function or slope of the straight line in a log-log plot (Stevens, 1957). The validity of magnitude estimation then came to hang on the validity of the power law-the methods and resulting functions formed an internally consistent theoretical system. Stevens also viewed the method as providing a direct window into sensation magnitude and did not question the idea that these numbers generated by subjects might be biased in some way. However, the generation of responses is properly viewed as combining at least two processes, the psychophysical transformation of energy into conscious sensation and the application of numbers to those sensations. This response process was not given due consideration in the early magnitude estimation work. Responses as numbers can exhibit nonlinear transformations of sensation (Banks and Coleman, 1981; Curtis et al., 1968) so the notion of a direct translation from sensation to ratings is certainly a dangerous oversimplification. Ratio-type instructions have been applied to other techniques as well as to magnitude estimation. A historically important psychophysical technique was that of cross-modality matching, in which the sensation levels or ratios would be matched in two sensory continua such as loudness and brightness. One continuum would be adjusted by the experimenter and the other by the subject. For example, one would try to make the

    log R2 = n2 log I2 + log k2

    (7.4)

    Setting R1 = R2 in cross-modality matching gives n1 log I1 + log k1 = n2 log I2 + log k2

    (7.5)

    and rearranging, log I1 = (n2 /n1 ) log I2 + a constant

    (7.6)

    If one plots log I1 against log I2 from a crossmodality matching task, the slope of the function can be pdicted from the ratio of the slopes of the inpidual exponents (i.e., n2 /n1 , which you can derive from two separate magnitude estimation tasks). This pdiction holds rather well for a large number of compared sensory continua (Steven, 1969). However, whether it actually provides a validation for the power law or for magnitude estimation has been questioned (e.g., Ekman, 1964). For practical purposes, it is instructive that people can actually take disparate sensory continua and compare them using some generalized notion of sensory intensity. This is one of the underpinnings of the use of a universal scale in the Spectrum descriptive procedure (Meilgaard et al., 2006). In that method, different attributes are rated on 15-point scales that can (in theory) be meaningfully compared. In other words, a 12 in sweetness is twice as intense a sensation as a 6 in saltiness. Such comparisons seem to makes sense for tastes and flavors but may not cut across all other modalities. For example, it might seem less sensible to compare the rating given for the amount of chocolate chips in a cookie to the rating given for the cookie’s hardness-these seem like quite different experiences to quantify.

    162

    7

    Scaling

    or line scale? The idea of scaling word phrases takes shape in the labeled magnitude scales discussed next.

    7.5.2 Category-Ratio (Labeled Magnitude) Scales A group of hybrid techniques for scaling has recently enjoyed some popularity in the study of taste and smell, for hedonic measurement and other applications. One of the problems with magnitude estimation data is that it does not tell in any absolute sense whether sensations are weak or strong, only giving the ratios among them. This group of scales attempts to provide ratio information, but combines it with common verbal descriptors along a line scale to provide a simple frame of reference. They are referred to as category-ratio scales, or more recently, labeled magnitude scales. They all involve a horizontal or vertical line with deliberately spaced labels and the panelists’ task is to make a mark somewhere along the line to indicate the strength of their perception or strength of their likes or dislikes. In general, these labeled line scales give data that are consistent with those from magnitude estimation (Green et al., 1993). An unusual characteristic of these scales is the verbal high endanchor phrase, which often refers to the “strongest imaginable.” The technique is based on early work by Borg and colleagues, primarily in the realm of perceived physical exertion (Borg, 1982, 1990; see Green et al., 1993). In developing this scale, Borg assumed that the semantic descriptors could be placed on a ratio scale and that they defined the level of perceptual intensity and that all inpiduals experienced the same perceptual range. Borg suggested that for perceived exertion, the maximal sensation is roughly equivalent across people for this sensory “modality” (Marks et al., 1983). For example, it is conceivable that riding a bicycle to the point of physical exhaustion produces a similar sensory experience for most people. So the scale came to have the highest label referring to the strongest sensation imaginable. This led to the development of the labeled magnitude scale (LMS) shown in Fig. 7.4. It is truly a hybrid method since the response is a vertical linemarking task but verbal anchors are spaced according to calibration using ratio-scaling instructions (Green

    7.5 Variations-Other Scaling Techniques

    Strongest imaginable

    Very strong

    (95.5)

    (50.12)

    Strong

    (33.1)

    Moderate

    (16.2)

    Weak Barely detectable

    163

    LMS

    (5.8) (1.4)

    Fig. 7.4 The labeled magnitude scale (LMS) of Green et al. (1993).

    2007; Lawless et al., 2010a, b, c). A growing number of similar scales have been developed for various applications including oral pleasantness/unpleasantness (the “OPUS” scale, Guest et al., 2007), perceived satiety (the “SLIM” scale, Cardello et al., 2005), clothing fabric comfort (the “CALM” scale, Cardello et al., 2003), and odor dissimilarity (Kurtz et al., 2000). All of these scales depend upon a ratio scaling task to determine the spacing of the verbal descriptors and almost all use a Borg-type high end-anchor phrase. Others will surely be developed. Instructions to participants have differed in the use of these scales. In the first application of the LMS, Green et al. (1993) instructed subjects to first choose the most appropriate verbal descriptor, and then to “fine tune” their judgment by placing a mark on the line between that descriptor and the next most appropriate one. In current practice less emphasis may be placed on the consideration of the verbal labels and instructions may be given to simply make a mark “anywhere” on the line. A common observation with the hedonic versions of the scale is that some panelists will mark at or very near a verbal descriptor, seeming to use it as a category scale (Cardello et al., 2008; Lawless et al., 2010a). The proportion of people displaying this behavior may depend upon the physical length of the line (and not the instructions or examples that may be shown) (Lawless et al., 2010b). Results may depend in part on the nature of the high end-anchor example and the frame of reference of the subject in terms of the sensory modality they are thinking about. Green et al. (1996) studied the application of the LMS to taste and odor, using descriptors for the upper bound as “strongest imaginable” taste, smell, sweetness, etc. Steeper functions (a smaller response range) were obtained when mentioning inpidual taste qualities. This appears to be due to the omission of painful experiences (e.g., the “burn of hot peppers”) from the frame of reference when sensations were scaled relative to only taste. The steepening of the functions for the truncated frame of reference is consistent with the idea that subjects expanded their range of numbers as seen in other scaling experiments (e.g., Lawless and Malone, 1986b). The fact that subjects appear to adjust their perceptual range depending on instructions or frame of reference suggests that the scales have relative and not absolute properties, like most other scaling methods. Cardello et al. (2008) showed that the hedonic version of the scale (the LAM

    164 Greatest imaginable like

    Most imaginable

    (+100)

    Scaling (+100)

    Like Extremely

    (+74.2)

    extremely

    (+75.2)

    Like Very Much

    (+56.1)

    very

    (+56.1)

    Like Moderately

    (+36.2)

    moderately

    (+33.9)

    Like Slightly

    (+11.2)

    slightly weakly

    (+19.7) (+15.0)

    PLEASANT

    Fig. 7.5 Affective labeled magnitude scales, including the LAM scale (Cardello and Schutz, 2004) and the OPUS scale (Guest et al., 2007).

    7

    barely detectable Neither like nor dislike

    (+6.5) (0)

    (0)

    (+6.5)

    (-10.6)

    barely detectable weakly slightly

    (+15.0) (+19.7)

    Dislike moderately

    (-31.9)

    moderately

    (+33.9)

    Dislike Very much

    (-55.5)

    very

    (+56.1)

    Dislike extremely

    (-75.5)

    extremely

    (+75.2)

    Greatest imaginable dislike

    (-100)

    LAM scale) will also show such range effects. A compssed range of responses is obtained when the frame of reference is greatest imaginable like (dislike) for an “experience of any kind” rather than something more delimited like “foods and beverages.” Apparently the compssion is not very detrimental to the ability of the LAM scale to differentiate products (Cardello et al., 2008, but see also Lawless et al., 2010a). Is this a suitable method for cross-subject comparisons? To the extent that Borg’s assumptions of common perceptual range and the similarity of the high end-anchor experience among people are true, the method might provide one approach to valid comparisons of the ratings among different respondents. This would facilitate comparisons of clinical groups or patients with sensory disorders or genetically different inpiduals such as anosmics, PTC/PROP taster groups. Bartoshuk and colleagues (1999, 2003,

    UNPLEASANT

    Dislike slightly

    Most imaginable

    (-100)

    OPUS 2004a, b, 2006) have argued that the labeled magnitude scales should anchor their endpoints to “sensations of any kind” as such a reference experience would allow different inpiduals to use the scale in similar ways/and thus facilitate inter-inpidual comparisons. Scales with this kind of high end anchor have been termed “generalized” labeled magnitude scales (or gLMS). However, the sensory evaluation practitioner should be aware of the compssion effects that can occur with this kind of scale, which could potentially lead to lessened differentiation among products.

    7.5.3 Adjustable Rating Techniques: Relative Scaling A few methods have been tried that allow consumers or panelists to change their ratings. An example is

    7.5 Variations-Other Scaling Techniques

    165

    7.5.4 Ranking Another alternative to traditional scaling is the use of ranking procedures. Ranking is simply ordering the products from weakest to strongest on the stated

    166

    7.5.5 Indirect Scales A conservative approach to scaling is to use the variance in the data as units of measurement, rather than the numbers taken at face value. For example, we could

    7

    Scaling

    ask how many standard deviations apart are the mean values for two products. This is a different approach to measurement than simply asking how many scale units separate the means on the response scale. On a 9-point scale, one product may receive mean rating of seven, and another nine, making the difference two scale units. If the pooled standard deviation is two units, however, they would only be one unit apart on a variability-based scale. As one example, Conner and Booth (1988) used both the slope and the variance of functions from just-right scales to derive a “tolerance discrimination ratio.” This ratio repsents a measure of the degree of difference along a physical continuum (such as concentration of sugar in lime drink) that observers find to make a meaningful change in their ratings of difference-from-ideal (or just-right). This is analogous to finding the size of a just-noticeable difference, but translated into the realm of hedonic scaling. Their insight was that it is not only the slope of the line that is important in determining this tolerance or liking-discrimination function, but also the variance around that function. Variability-based scales are the basis for scaling in Thurstone’s models for comparative judgment (Thurstone, 1927) and its extension into determining distances between category boundaries (Edwards, 1952). Since the scale values can be found from choice experiments as well as rating experiments, the technique is quite flexible. How this type of scaling can be applied to rating data is discussed below in the derivation of the 9-point hedonic scale words. When the scale values are derived from a choice method like the triangle test or paired comparison method, this is sometimes called “indirect scaling” (Baird and Noma, 1978; Jones, 1974). The basic operation in Thurstonian scaling of choice data is to convert the proportion correct in a choice experiment (or simply the proportions in a two-tailed test like paired pference) to Z-scores. The exact derivation depends upon the type of test (e.g., triangle versus 3-AFC) and the cognitive strategy used by the subject. Tables for Thurstonian scale values from various tests such as the triangle test were given by Frijters et al. (1980) and Bi (2006) and some tables are given in the Appendix. Mathematical details of Thurstonian scaling are discussed in Chapter 5. Deriving measures of sensory differences in such indirect ways psents several problems in applied sensory evaluation so the method has not been widely used. The first problem is one of economy in data

    7.6 Comparing Methods: What is a Good Scale?

    collection. Each difference score is derived from a separate discrimination experiment such as a paired comparison test. Thus many subjects must be tested to get a good estimate of the proportion of choice, and this yields just one scale value. In direct scaling, each participant gives at least one data point for each item tasted. Direct scaling allows for easy comparisons among multiple products, while the discrimination test must be done on one pair at a time. Thus the methods of indirect scaling are not cost-efficient. A second problem can occur if the products are too clearly different on the attribute in question, because then the proportion correct will approach 100%. At that point the scale value is undefined as they are some unknown number of standard deviations apart. So the method only works when there are small differences and some confusability of the items. In a study of many products, however, it is sometimes possible to compare only adjacent or similar items, e.g., products that differ in small degrees of some ingredient or process variable. This approach was taken by Yamaguchi (1967) in examining the synergistic taste combination of monosodium glutamate and disodium 5 inosinate. Many different levels of the two ingredients were tasted, but because the differences between some levels were quite apparent, an incomplete design was used in which only three adjacent levels were compared. Other applications have also used this notion of variability as a yardstick for sensory difference or sensation intensity. The original approach of Fechner in constructing a psychophysical function was to accumulate the difference thresholds or just-noticeable differences (JNDs) in order to construct the log function of psychophysical sensory intensity (Boring, 1942; Jones, 1974). McBride (1983a, b) examined whether JND-based scales might give similar results to category scales for taste intensity. Both types of scaling yielded similar results, perhaps not surprising since both tend to conform to log functions. In a study of children’s pferences for different odors Engen (1974) used a paired pference paradigm, which was well suited to the abilities of young children to respond in a judgment task. He then converted the paired pference proportions to Thurstonian scale values via Z-scores and was able to show that the hedonic range of children was smaller than that of adults. Another example of choice data that can be converted to scale values is best-worst scaling, in which

    167

    a consumer is asked to choose the best liked and least liked samples from a set of three or more items (Jaeger et al., 2008). With three products, it can be considered a form of a ranking task. When applied to sensory intensity, this is sometimes known as maximum-difference or “max-diff.” Best-worst scaling is also discussed in Section 13.7. Simple difference scores may be calculated based on the number of times an item is called best versus worst and these scores are supposed to have interval properties. If a multinomial logistic regression is performed on the data, they are theorized to have true ratio properties (Finn and Louviere, 1992). A practical problem with the method, however, is that so many products must be tasted and compared, rendering it difficult to perform with foods (Jaeger and Cardello, 2009). The sensory professional should bear in mind that in spite of their theoretical sophistication, the indirect methods are based on variability as the main determinant of degree of difference. Thus any influence, which increases variability, will tend to decrease the measured differences among products. In the wellcontrolled psychophysical experiment under constant standard conditions across sessions and days, this may not be important-the primary variability lies in the resolving power of the participant (and secondarily in the sample products). But in drawing conclusions across different days, batches, panels, factories, and such, one has a less pure situation to consider. Whether one considers the Thurstonian-type indirect measures comparable across different conditions depends upon the control of extraneous variation.

    7.6 Comparing Methods: What is a Good Scale? A large number of empirical studies have been conducted comparing the results using different scaling methods (e.g., Birnbaum, 1982; Giovanni and Pangborn, 1983; Hein et al., 2008; Jaeger and Cardello, 2009; Lawless and Malone, 1986a, b; Lawless et al., 2010a; Marks et al., 1983; Moskowitz and Sidel, 1971; Pearce et al., 1986; Piggot and Harper, 1975; Shand et al., 1985; Vickers, 1983; Villanueva and Da Silva, 2009; Villanueva et al., 2005). Because scaling data are often used to identify differences between products, the ability to detect differences is one important

    168

    practical criterion for how useful a scaling method can be (Moskowitz and Sidel, 1971). A related criterion is the degree of error variance or similar measures such as size of standard deviations or coefficients of variation. Obviously, a scaling method with low interinpidual variability will result in more sensitive tests, more significant differences, and lower risk of Type II error (missing a true difference). A related issue is the reliability of the procedure. Similar results should be obtained upon repeated experimentation. Other practical considerations are important as well. The task should be user friendly and easy to understand for all participants. Ideally, the method should be applicable to a wide range of products and questions, so that the respondent is not confused by changes in response type over a long ballot or questionnaire. If panelists are familiar with one scale type and are using it effectively, there may be some liability in trying to introduce a new or unfamiliar method. Some methods, like category scales, line scales, and magnitude estimation, can be applied to both intensity and hedonic (like-dislike) responses. The amount of time required to code, tabulate and process the information may be a concern, depending upon computer-assisted data collection and other resources available to the experimenters. As in any method, validity or accuracy are also issues. Validity can only be judged by reference to some external criterion. For hedonic scaling, one might want the method to correspond to other behaviors such as choice or consumption (Lawless et al., 2010a). A related criterion is the ability of the scale to identify or uncover consumer segments with different pferences (Villanueva and Da Silva, 2009). Given these practical considerations, we may then ask how the different scaling methods fare. Most published studies have found about equal sensitivity for the different scaling methods, provided that the methods are applied in a reasonable manner. For example, Lawless and Malone (1986a, b) performed an extensive series of studies (over 20,000 judgments) with consumers in central location tests using different sensory continua including olfaction, tactile, and visual modalities. They compared line scales, magnitude estimation, and category scales. Using the degree of statistical differentiation among products as the criterion for utility of the methods, the scales performed about equally well. A similar conclusion was reached by Shand et al. (1985) for trained panelists. There

    7

    Scaling

    was some small tendency for magnitude estimation to be marginally more variable in the hands of consumers as opposed to college students (Lawless and Malone, 1986b). Statistical differentiation increased over replicates, as would be expected as people came to understand the range of items to be judged (see Hein et al., 2008, for another example of improvement over replication in hedonic scaling). Similar findings for magnitude estimation and category scales in terms of product differentiation were found by Moskowitz and Sidel (1971), Pearce et al. (1986), Shand et al. (1985), and Vickers (1983) although the forms of the mathematical relations to underlying physical variables was often different (Piggot and Harper, 1975). In other words, as found by Stevens and Galanter (1957) there is often a curvilinear relationship between the data from the two methods. However, this has not been universally observed and sometimes simple linear relationships have been found (Vickers, 1983). Similar results for category scales and line scales were found by Mattes and Lawless (1985). Taken together, these empirical studies paint a picture of much more parity among methods than one might suppose given the number of arguments over the validity of scaling methods in the psychological literature. With reasonable spacing of the products and some familiarization with the range to be expected, respondents will distribute their judgments across the available scale range and use the scale appropriately to differentiate the products. A reasonable summary of the literature comparing scale types is that they work about equally well to differentiate products, given a few sensible pcautions.

    7.7 Issues

    responses (i.e., judgments that might fall in-between categories). The line scale offers a continuously graded choice of alternatives, limited only by the measurement abilities in data tabulation. Baten also noted that the line scale seemed to facilitate a relative comparison among the products. This was probably due to his placement of the scales one above the other on the ballot, so judges could see both markings at the same time. In order to minimize such contextual effects it is now more common to remove the prior ratings for products to achieve a more independent judgment of the products. However, whether that is ever achieved in practice is open to question-humans are naturally comparative when asked to evaluate items, as discussed in Chapter 9. Furthermore, there may be potential for increased discrimination in methods like the relative positioning technique. The naturally comparative nature of human judgment may be something we could benefit from rather than trying to fight this tendency by over-calibration.

    7.7.2 Should Category Rating Scales Be Assigned Integer Numbers in Data Tabulation? Are They Interval Scales? There is also a strong suspicion that many numerical scaling methods may produce only ordinal data, because the spacing between alternatives is not subjectively equal. A good example is the common marketing research scale of “excellent-very good-good- fair-poor.” The subjective spacing between these adjectives is quite uneven. The difference between two products rated good and very good is a much smaller difference than that between products rated fair and poor. However, in analysis we are often tempted to assigned numbers one through five to these categories and take means and perform statistics as if the assigned numbers reflected equal spacing. This is a ptense at best. A reasonable analysis of the 5-point excellent to poor scale is simply to count the number of respondents in each category and to compare frequencies. Sensory scientists should not assume that any scale has interval properties in spite of how easy it is to tabulate data as an integer series.

    169

    7.7.4 What is a “Valid” Scale? An ongoing issue in psychophysics is what kind of scale is a true reflection of the subject’s actual sensations? From this perspective, a scale is valid when the numbers generated reflect a linear translation of subjective intensity (the private experience). It is well established that category scales and magnitude estimates, when given to the same stimuli, will form a curve when plotted against one another (Stevens and Galanter, 1957). Because this is not a linear relationship, one method or the other must result from a non-linear translation of the subjective intensities of the stimuli. Therefore, by this criterion, at least one scale must be “invalid.”

    170

    7

    Anderson (1974, 1977) proposed a functional measurement theory to address this issue. In a typical experiment, he would ask subjects to do some kind of combination task, like judging the total combined intensity of two separately psented items (or the average lightness of two gray swatches). He would set up a factorial design in which every level of one stimulus was combined with every level of the other (i.e., a complete block). When plotting the response, a family of lines would be seen when the first stimulus continuum formed the X-axis and the second formed a family of lines. Anderson argued that only when the response combination rule was additive, and the response output function was linear, would a parallel plot be obtained (i.e., there would be no significant interaction term in ANOVA). This argument is illustrated in Fig. 7.6. In his studies using simple line and category scales, parallelism was obtained in a number of studies, and thus he reasoned that magnitude estimation was invalid by this criterion. If magnitude estimation is invalid, then its derivatives such as the LMS and LAM scales are similarly suspect.

    S1

    p1

    S2

    p2

    P Psychophysical process

    Response output process

    Integration process

    Response for Combination Judgment

    R

    (High)

    (Med.)

    Levels of Stimulus or Variable 2

    (Low)

    (Low)

    (Med.)

    (High)

    Levels of Stimulus or Variable 1

    Fig. 7.6 The functional measurement scheme of Anderson (1974).

    Scaling

    Others have found support for the validity of magnitude estimation in studies of binaural loudness summation (Marks, 1978). This argument continues and is difficult to resolve. A review of the matter was published by Gescheider (1988). For the purposes of sensory evaluation, the issue is not terribly important for two reasons. First, any scale that produces statistically significant differentiation of products is a useful scale. Second, the physical ranges over which category scaling and magnitude estimation produce different results is usually quite large in any psychophysical study. In most product tests, the differences are much more subtle and generally do not span such a wide dynamic range. The issue dissolves from any practical perspective.

    7.8 Conclusions Much sound and fury has been generated over the years in the psychophysical literature concerning what methods yield valid scales. For the sensory practitioner, these issues are less relevant because the scale values do not generally have any absolute meaning. They are only convenient indices of the relative intensities or appeal of different products. The degree of difference may be a useful piece of information, but often we are simply interested in which product is stronger or weaker in some attribute, and whether the difference is both statistically significant and practically meaningful. Scaling provides a quick and useful way to get intensity or liking information. In the case of descriptive analysis, scaling allows collection of quantitative data on multiple attributes. The degree of variability or noise in the system is, to a large part, determined by whether the panelists have a common frame of reference. Thus reference standards for both the attribute terms and for the intensity anchors are useful. Of course, with consumer evaluations or a psychophysical study such calibration is not possible and usually not desired. The variability of consumer responses should offer a note of caution in the interptation of consumer scaling data. Students and sensory practitioners should examine their scaling methods with a critical eye. Not every task that assigns numbers will have useful scale properties like equal intervals. Bad examples abound

    Appendix 1

    in the commodity grading (quality scoring) literature (Pangborn and Dunkley, 1964). For example, different numbers may be assigned to different levels of oxidation, but that is scoring a physical condition based on inferences from sensory experience. It is not a report of the intensity of some experience itself. It is not tracking changes along a single perceptual continuum in the psychophysical sense. Scoring is not scaling. All hedonic scales seem to measure what they are intended to measure rather effectively, as long as no gross mistakes are made (Peryam, 1989, p. 23).

    Appendix 1: Derivation of Thurstonian-Scale Values for the 9-Point Scale The choice of adjective words for the 9-point hedonic scale is a good example of how carefully a scale can be constructed. The long-standing track record of this tool demonstrates its utility and wide applicability in consumer testing. However, few sensory practitioners actually know how the adjectives were found and what criteria were brought to bear in selecting these descriptors (slightly, moderately, very much, and extremely like/dislike) from a larger pool of possible words. The goal of this section is to provide a shorthand description of the criteria and mathematical method used to select the words for this scale. One concern was the degree to which the term had consensual meaning in the population. The most serious concern was when a candidate word had an ambiguous or double meaning across the population. For example, the word “average” suggests an intermediate response to some people, but in the original study by Jones and Thurstone (1955) there were a group of people who equated it with “like moderately” perhaps since an average product in those days was one that people would like. These days, one can think of negative connotations to the word “average” as in “he was only an average student.” Other ambiguous or bimodal terms were “like not so much” and “like not so well.” Ideally, a term should have low variability in meaning, i.e., a low standard deviation, no bimodality, and little skew. Part of this concern with the normality of the distribution of psychological reactions to a word was the fact that the developers used Thurstone’s model

    171

    172

    7

    that marks the ordinate in equal standard deviations units according to the cumulative normal distribution. Either of these methods will tend to make the cumulative S-shaped curve for the item into a straight line. The X-axis value for each point is the “category z-value” for that bucket. 6. Fit a line to the data and interpolate the 50% point on the X-axis (the re-scaled category boundary estimates). These interpolated values for the median for each item now form the new scale values for the items. An example of this interpolation is shown in Fig. 7.7. Three of the phrases used in the original scaling study of Jones and Thurstone (1955) are pictured, three that were not actually chosen but for which we have approximate proportions and z-scores from their ps. The small vertical arrows on the X-axis show the scale values for the original categories of −4 to +3 (+4 has cumulative proportion of 100% and thus the z-score is infinite). Table 7.1 gives the values and

    Scaling

    proportions for each phrase and the original categories. The dashed vertical lines dropped from the intersection at the zero z-score (50% point) show the approximate mean values interpolated on the X-axis (i.e., about -1.1 for “do not care for it” and about +2.1 for “pferred.”). Note that “pferred” and “don’t care for it” have a linear fit and steep slope, suggesting a normal distribution and low standard deviation. In contrast, “highly unfavorable” has a lower slope and some curvilinearity, indicative of higher variability, skew, and/or pockets of disagreement about the connotation of this term. The actual scale values for the original adjectives are shown in Table 7.2, as found with a soldier population circa 1950 (Jones et al., 1955). You may note that the words are not equally spaced, and that the “slightly” values are closer to the neutral point than some of the other intervals, and the extreme points are a little farther out. This bears a good deal of similarity to the intervals found with the LAM scale as shown in the column where the LAM values are re-scaled to the same range as the 9-point Thurstonian Values.

    Appendix 2: Construction of Labeled Magnitude Scales “

    Fig. 7.7 An illustration of the method used to establish spacings and scale values for the 9-point hedonic scale using Thurstonian theory. Arrows on the X-axis show the scale points for the z-scores based on the complete distribution of the original -4 to +4 ratings. The Y-axis shows the actual z-scores based on the proportion of respondents using that category for each specific term. Re-plotted from data provided in Jones et al. (1955).

    There are two primary methods for constructing labeled magnitude scales and they are very similar. Both require magnitude estimates from the participants to scale the word phrases used on the lines. In one case, just the word phrases are scaled, and in the second method, the word phrases are scaled among a list of common everyday experiences or sensations that most people are familiar with. The values obtained by the simple scaling of just the words will depend upon the words that are chosen, and extremely high examples (e.g., greatest imaginable liking for any experience) will tend to compss the values of the interior phrases (Cardello et al., 2008). Whether this kind of context effect will occur for the more general method of scaling amongst common experiences is not known. But the use of a broad frame of reference could be a stabilizing factor. Here is an example of the instructions given to subjects in construction of a labeled affective magnitude scale. Note that for hedonics, which are a bipolar continuum with a neutral point, it is necessary to collect a tone or valence (plus or minus) value as well as the overall “intensity” rating.

    Appendix 2

    173

    Table 7.1 Examples of scaled phrases used in Fig. 7.7 Original “Preferred” category Proportion Z-score Proportion 4 3 2 1 0 −1 −2 −3 −4

    1.000 0.999 0.983 0.882 0.616 0.383 0.185 0.068 0.008

    Table 7.2 Actual 9-point scale phrase values and comparison to the LAM values

    (undef.) 3.0 2.1 1.2 0.3 −0.3 −0.9 −1.5 −2.4

    0.80 0.50 0.20 0.07 0.03

    Z-score 0.84 0.00 −0.84 −1.48 −1.88

    Descriptor

    Scale value (9-point)

    Like extremely Like very much Like moderately Like slightly Neither like nor dislike Dislike slightly Dislike moderately Dislike very much Dislike extremely

    4.16 2.91 1.12 0.69 0.00 −0.59 −1.20 −2.49 −4.32

    Next to each word label a response area appeared similar to this: Phrase: Like extremely

    Words or phrases are psented in random order. After reading a word they must decide whether the word is positive, negative or neutral and place the corresponding symbol on the first line. If the hedonic tone was not a neutral one (zero value), they are instructed to give a numerical estimate using modulus-free magnitude estimation. The following is a sample of the instructions taken from Cardello et al. (2008): After having determined whether the phrase is positive or negative or neutral and writing the appropriate symbol (+, −, 0) on the first line, you will then assess the strength or magnitude of the liking or disliking reflected by the phrase. You will do this by placing a number on the second blank line (under “How Much”). For the first phrase that you rate, you can write any number you want on the line. We suggest you do not use a small number for this word/phrase. The reason for this is that subsequent words/phrases may reflect much lower levels of liking or disliking. Aside from this restriction you can use any numbers you want. For each subsequent

    “Do not care for it” Proportion Z-score

    “Highly unfavorable” Proportion Z-score

    0.96 0.83 0.55 0.30 0.14

    1.75 0.95 0.13 −0.52 −1.08

    0.96 0.93 0.92 0.90 0.86 0.84 0.82 0.46

    Interval

    LAM value

    LAM rescaled

    Interval

    1.26 1.79 0.43 0.69 0.59 0.61 1.29 1.83

    74.2 56.1 36.2 11.2 0.0 −10.6 −31.9 −55.5 −75.5

    4.20 3.18 2.05 0.63 0.00 −0.60 −1.81 −3.14 −4.28

    1.02 1.13 1.52 0.63 0.60 1.21 1.33 1.14

    1.75 1.48 1.41 1.28 1.08 0.99 0.92 −0.10

    word/phrase your numerical judgment should be made proportionally and in comparison to the first number. That is, if you assigned the number 800 to index the strength of the liking/disliking denoted by the first word/phrase and the strength of liking/disliking denoted by the second word/phrase were twice as great, you would assign the number 1,600. If it were three times as great you would assign the number 2,400, etc. Similarly, if the second word/phrase denoted only 1/10 the magnitude of liking as the first, you would assign it the number 80 and so forth. If any word/phrase is judged to be “neutral” (zero (0) on the first line) it should also be given a zero for its magnitude rating.

    In the cased of Cardello et al. (2008), positive and negative word labels were analyzed separately. Raw magnitude estimates were equalized for scale range using the procedure of Lane et al. (1961). All positive and negative magnitude estimates for a given subject were multiplied by an inpidual scaling factor. This factor was equal to the ratio of the grand geometric mean (of the absolute value of all nonzero ratings) across all subjects pided by the geometric mean for that subject. The geometric mean magnitude estimates for each phrase were then calculated based on this range-equated data. These means became the distance

    174

    from the zero point for placement of the phrases along the scale, usually accompanied by a short cross-hatch mark at that point.

    7

    Scaling

    Basker, D. 1988. Critical values of differences among rank sums for multiple comparisons. Food Technology, 42(2), 79, 80-84. Baten, W. D. 1946. Organoleptic tests pertaining to apples and pears. Food Research, 11, 84-94. Bendig, A. W. and Hughes, J. B. 1953. Effect of number of verbal anchoring and number of rating scale categories upon transmitted information. Journal of Experimental Psychology, 46(2), 87-90. Bi, J. 2006. Sensory Discrimination Tests and Measurement. Blackwell, Ames, IA. Birch, L. L., Zimmerman, S. I. and Hind, H. 1980. The influence of social-affective context on the formation of children’s food pferences. Child Development, 51, 865-861. Birch, L. L., Birch, D., Marlin, D. W. and Kramer, L. 1982. Effects of instrumental consumption on children’s food pferences. Appetite, 3, 125-143. Birnbaum, M. H. 1982. Problems with so-called “direct” scaling. In: J. T. Kuznicki, R. A. Johnson and A. F. Rutkiewic (eds.), Selected Sensory Methods: Problems and Approaches to Hedonics. American Society for Testing and Materials, Philadelphia, pp. 34-48. Borg, G. 1982. A category scale with ratio properties for intermodal and interinpidual comparisons. In: H.-G. Geissler and P. Pextod (Eds.), Psychophysical Judgment and the Process of Perception. VEB Deutscher Verlag der Wissenschaften, Berlin, pp. 25-34. Borg, G. 1990. Psychophysical scaling with applications in physical work and the perception of exertion. Scandinavian Journal of Work and Environmental Health, 16, 55-58. Boring, E. G. 1942. Sensation and Perception in the History of Experimental Psychology. Appleton-Century-Crofts, New York. Brandt, M. A., Skinner, E. Z. and Coleman, J. A. 1963. The texture profile method. Journal of Food Science, 28, 404-409. Butler, G., Poste, L. M., Wolynetz, M. S., Agar, V. E. and Larmond, E. 1987. Alternative analyses of magnitude estimation data. Journal of Sensory Studies, 2, 243-257. Cardello, A. V. and Schutz, H. G. 2004. Research note. Numerical scale-point locations for constructing the LAM (Labeled affective magnitude) scale. Journal of Sensory Studies, 19, 341-346. Cardello, A. V., Lawless, H. T. and Schutz, H. G. 2008. Effects of extreme anchors and interior label spacing on labeled magnitude scales. Food Quality and Preference, 21, 323-334. Cardello, A. V., Winterhaler, C. and Schutz, H. G. 2003. Predicting the handle and comfort of military clothing fabrics from sensory and instrumental data: Development and application of new psychophysical methods. Textile Research Journal, 73, 221-237. Cardello, A. V., Schutz, H. G., Lesher, L. L. and Merrill, E. 2005. Development and testing of a labeled magnitude scale of perceived satiety. Appetite, 44, 1-13. Caul, J. F. 1957. The profile method of flavor analysis. Advances in Food Research, 7, 1-40. Chambers, E. C. and Wolf, M. B. 1996. Sensory Testing Methods. ASTM Manual Series, MNL 26. ASTM International, West Conshohocken, PA. Chen, A. W., Resurreccion, A. V. A. and Paguio, L. P. 1996. Age appropriate hedonic scales to measure the food pferences of young children. Journal of Sensory Studies, 11, 141-163.

    References Chung, S.-J. and Vickers, 2007a. Long-term acceptability and choice of teas differing in sweetness. Food Quality and Preference 18, 963-974. Chung, S.-J. and Vickers, 2007b. Influence of sweetness on the sensory-specific satiety and long-term acceptability of tea. Food Quality and Preference, 18, 256-267. Coetzee, H. and Taylor, J. R. N. 1996. The use and adaptation of the paired comparison method in the sensory evaluation of hamburger-type patties by illiterate/semi-literate consumers. Food Quality and Preference, 7, 81-85. Collins, A. A. and Gescheider, G. A. 1989. The measurement of loudness in inpidual children and adults by absolute magnitude estimation and cross modality matching. Journal of the Acoustical Society of America, 85, 2012-2021. Conner, M. T. and Booth, D. A. 1988. Preferred sweetness of a lime drink and pference for sweet over non-sweet foods. Related to sex and reported age and body weight. Appetite, 10, 25-35. Cordinnier, S. M. and Delwiche, J. F. 2008. An alternative method for assessing liking: Positional relative rating versus the 9-point hedonic scale. Journal of Sensory Studies, 23, 284-292. Cox, E. P. 1980. The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 18, 407-422. Curtis, D. W., Attneave, F. and Harrington, T. L. 1968. A test of a two-stage model of magnitude estimation. Perception and Psychophysics, 3, 25-31. Edwards, A. L. 1952. The scaling of stimuli by the method of successive intervals. Journal of Applied Psychology, 36, 118-122. Ekman, G. 1964. Is the power law a special case of Fechner’s law? Perceptual and Motor Skills, 19, 730. Einstein, M. A. 1976. Use of linear rating scales for the evaluation of beer flavor by consumers. Journal of Food Science, 41, 383-385. El Dine, A. N. and Olabi, A. 2009. Effect of reference foods in repeated acceptability tests: Testing familiar and novel foods using 2 acceptability scales. Journal of Food Science, 74, S97-S105. Engen, T. 1974. Method and theory in the study of odor pferences. In: A. Turk, J. W. Johnson and D. G. Moulton (Eds.), Human Responses to Environmental Odors. Academic, New York. Finn, A. and Louviere, J. J. 1992. Determining the appropriate response to evidence of public concern: The case of food safety. Journal of Public Policy and Marketing, 11, 12-25. Forde, C. G. and Delahunty, C. M. 2004. Understanding the role cross-modal sensory interactions play in food acceptability in younger and older consumers. Food Quality and Preference, 15, 715-727. Frijters, J. E. R., Kooistra, A. and Vereijken, P. F. G. 1980. Tables of d’ for the triangular method and the 3-AFC signal detection procedure. Perception and Psychophysics, 27, 176-178. Gaito, J. 1980. Measurement scales and statistics: Resurgence of an old misconception. Psychological Bulletin, 87, 564-587. Gay, C., and Mead, R. 1992 A statistical appraisal of the problem of sensory measurement. Journal of Sensory Studies, 7, 205-228.

    175 Gent, J. F. and Bartoshuk, L. M. 1983. Sweetness of sucrose, neohesperidin dihydrochalcone and sacchar in is related to genetic ability to taste the bitter substance 6-npropylthiouracil. Chemical Senses, 7, 265-272. Gescheider, G. A. 1988. Psychophysical scaling. Annual Review of Psychology, 39, 169-200. Giovanni, M. E. and Pangborn, R. M. 1983. Measurement of taste intensity and degree of liking of beverages by graphic scaling and magnitude estimation. Journal of Food Science, 48, 1175-1182. Gracely, R. H., McGrath, P. and Dubner, R. 1978a. Ratio scales of sensory and affective verbal-pain descriptors. Pain, 5, 5-18. Gracely, R. H., McGrath, P. and Dubner, R. 1978b. Validity and sensitivity of ratio scales of sensory and affective verbal-pain descriptors: Manipulation of affect by Diazepam. Pain, 5, 19-29. Green, B. G., Shaffer, G. S. and Gilmore, M. M. 1993. Derivation and evaluation of a semantic scale of oral sensation magnitude with apparent ratio properties. Chemical Senses, 18, 683-702. Green, B. G., Dalton, P., Cowart, B., Shaffer, G., Rankin, K. and Higgins, J. 1996. Evaluating the “Labeled Magnitude Scale” for measuring sensations of taste and smell. Chemical Senses, 21, 323-334. Greene, J. L., Bratka, K. J., Drake, M. A. and Sanders, T. H. 2006. Effective of category and line scales to characterize consumer perception of fruity fermented flavors in peanuts. Journal of Sensory Studies, 21, 146-154. Guest, S., Essick, G., Patel, A., Prajpati, R. and McGlone, F. 2007. Labeled magnitude scales for oral sensations of wetness, dryness, pleasantness and unpleasantness. Food Quality and Preference, 18, 342-352. Hein, K. A., Jaeger, S. R., Carr, B. T. and Delahunty, C. M. 2008. Comparison of five common acceptance and pference methods. Food Quality and Preference, 19, 651-661. Huskisson, E. C. 1983. Visual analogue scales. In: R. Melzack (Ed.), Pain Measurement and Assessment. Raven, New York, pp. 34-37. Jaeger, S. R.; Jørgensen, A. S., AAslying, M. D. and Bredie, W. L. P. 2008. Best-worst scaling: An introduction and initial comparison with monadic rating for pference elicitation with food products. Food Quality and Preference, 19, 579-588. Jaeger, S. R. and Cardello, A. V. 2009. Direct and indirect hedonic scaling methods: A comparison of the labeled affective magnitude (LAM) scale and best-worst scaling. Food Quality and Preference, 20, 249-258. Jones, F. N. 1974. History of psychophysics and judgment. In: E. C. Carterette and M. P. Friedman (Eds.), Handbook of Perception. Psychophysical Judgment and Measurement, Vol. 2. Academic, New York, pp. 11-22. Jones, L. V. and Thurstone, L. L. 1955. The psychophysics of semantics: An experimental investigation. Journal of Applied Psychology, 39, 31-36. Jones, L. V., Peryam, D. R. and Thurstone, L. L. 1955. Development of a scale for measuring soldier’s food pferences. Food Research, 20, 512-520. Keskitalo, K. Knaapila, A., Kallela, M., Palotie, A., Wessman, M., Sammalisto, S., Peltonen, L., Tuorila, H. and Perola, M. 2007. Sweet taste pference are partly genetically

    176 determined: Identification of a trait locus on Chromosome 161-3 . American Journal of Clinical Nutrition, 86, 55-63. Kim, K.-O. and O’Mahony, M. 1998. A new approach to category scales of intensity I: Traditional versus rank-rating. Journal of Sensory Studies, 13, 241-249. King, B. M. 1986. Odor intensity measured by an audio method. Journal of Food Science, 51, 1340-1344. Koo, T.-Y., Kim, K.-O., and O’Mahony, M. 2002. Effects of forgetting on performance on various intensity scaling protocols: Magnitude estimation and labeled magnitude scale (Green scale). Journal of Sensory Studies, 17, 177-192. Kroll, B. J. 1990. Evaluating rating scales for sensory testing with children. Food Technology, 44(11), 78-80, 82, 84, 86. Kurtz, D. B., White, T. L. and Hayes, M. 2000. The labeled dissimilarity scale: A metric of perceptual dissimilarity. Perception and Psychophysics, 62, 152-161. Land, D. G. and Shepard, R. 1984. Scaling and ranking methods. In: J. R. Piggott (ed.), Sensory Analysis of Foods. Elsevier Applied Science, London, pp. 141-177. Lane, H. L., Catania, A. C. and Stevens, S. S. 1961. Voice level: Autophonic scale, perceived loudness and effect of side tone. Journal of the Acoustical Society of America, 33, 160-167. Larson-Powers, N. and Pangborn, R. M. 1978. Descriptive analysis of the sensory properties of beverages and gelatins containing sucrose or synthetic sweeteners. Journal of Food Science, 43, 47-51. Lawless, H. T. 1977. The pleasantness of mixtures in taste and olfaction. Sensory Processes, 1, 227-237. Lawless, H. T. 1989. Logarithmic transformation of magnitude estimation data and comparisons of scaling methods. Journal of Sensory Studies, 4, 75-86. Lawless, H. T. and Clark, C. C. 1992. Psychological biases in time intensity scaling. Food Technology, 46, 81, 84-86, 90. Lawless, H. T. and Malone, J. G. 1986a. The discriminative efficiency of common scaling methods. Journal of Sensory Studies, 1, 85-96. Lawless, H. T. and Malone, G. J. 1986b. A comparison of scaling methods: Sensitivity, replicates and relative measurement. Journal of Sensory Studies, 1, 155-174. Lawless, H. T. and Skinner, E. Z. 1979. The duration and perceived intensity of sucrose taste. Perception and Psychophysics, 25, 249-258. Lawless, H. T., Popper, R. and Kroll, B. J. 2010a. Comparison of the labeled affective magnitude (LAM) scale, an 11-point category scale and the traditional nine-point hedonic scale. Food Quality and Preference, 21, 4-12. Lawless, H. T., Sinopoli, D. and Chapman, K. W. 2010b. A comparison of the labeled affective magnitude scale and the nine point hedonic scale and examination of categorical behavior. Journal of Sensory Studies, 25, S1, 54-66. Lawless, H. T., Cardello, A. V., Chapman, K. W., Lesher, L. L., Given, Z. and Schutz, H. G. 2010c. A comparison of the effectiveness of hedonic scales and end-anchor compssion effects. Journal of Sensory Studies, 28, S1, 18-34. Lee, H.-J., Kim, K.-O., and O’Mahony, M. 2001. Effects of forgetting on various protocols for category and line scales of intensity. Journal of Sensory Studies, 327-342. Likert, R. 1932. Technique for the measurement of attitudes. Archives of Psychology, 140, 1-55.

    7

    Scaling

    Lindvall, T. and Svensson, L. T. 1974. Equal unpleasantness matching of malodourous substances in the community. Journal of Applied Psychology, 59, 264-269. Mahoney, C. H., Stier, H. L. and Crosby, E. A. 1957. Evaluating flavor differences in canned foods. II. Fundamentals of the simplified procedure. Food Technology 11, Supplemental Symposium Proceedings, 37-42. Marks, L. E. 1978. Binaural summation of the loudness of pure tones. Journal of the Acoustical Society of America, 64, 107-113. Marks, L. E., Borg, G. and Ljunggren, G. 1983. Inpidual differences in perceived exertion assessed by two new methods. Perception and Psychophysic, 34, 280-288. Marks, L. E., Borg, G. and Westerlund, J. 1992. Differences in taste perception assessed by magnitude matching and by category-ratio scaling. Chemical Senses, 17, 493-506. Mattes, R. D. and Lawless, H. T. 1985. An adjustment error in optimization of taste intensity. Appetite, 6, 103-114. McBride, R. L. 1983a. A JND-scale/category scale convergence in taste. Perception and Psychophysics, 34, 77-83. McBride, R. L. 1983b. Taste intensity and the case of exponents greater than 1. Australian Journal of Psychology, 35, 175-184. McBurney, D. H. and Shick, T. R. 1971. Taste and water taste for 26 compounds in man. Perception and Psychophysics, 10, 249-252. McBurney, D. H. and Bartoshuk, L. M. 1973. Interactions between stimuli with different taste qualities. Physiology and Behavior, 10, 1101-1106. McBurney, D. H., Smith, D. V. and Shick, T. R. 1972. Gustatory cross-adaptation: Sourness and bitterness. Perception and Psychophysics, 11, 228-232. Mead, R. and Gay, C. 1995. Sequential design of sensory trials. Food Quality and Preference, 6, 271-280. Mecredy, J. M. Sonnemann, J. C. and Lehmann, S. J. 1974. Sensory profiling of beer by a modified QDA method. Food Technology, 28, 36-41. Meilgaard, M., Civille, G. V. and Carr, B. T. 2006. Sensory Evaluation Techniques, Fourth Edition. CRC, Boca Raton, FL. Moore, L. J. and Shoemaker, C. F. 1981. Sensory textural properties of stabilized ice cream. Journal of Food Science, 46, 399-402. Moskowitz, H. R. 1971. The sweetness and pleasantness of sugars. American Journal of Psychology, 84, 387-405. Moskowitz, H. R. and Sidel, J. L. 1971. Magnitude and hedonic scales of food acceptability. Journal of Food Science, 36, 677-680. Muñoz, A. M. and Civille, G. V. 1998. Universal, product and attribute specific scaling and the development of common lexicons in descriptive analysis. Journal of Sensory Studies, 13, 57-75. Newell, G. J. and MacFarlane, J. D. 1987. Expanded tables for multiple comparison procedures in the analysis of ranked data. Journal of Food Science, 52, 1721-1725. Olabi, A. and Lawless, H. T. 2008. Persistence of context effects with training and reference standards. Journal of Food Science, 73, S185-S189. O’Mahony, M., Park, H., Park, J. Y. and Kim, K.-O. 2004. Comparison of the statistical analysis of hedonic data using

    References analysis of variance and multiple comparisons versus and R-index analysis of the ranked data. Journal of Sensory Studies, 19, 519-529. Pangborn, R. M. and Dunkley, W. L. 1964. Laboratory procedures for evaluating the sensory properties of milk. Dairy Science Abstracts, 26-55-62. Parducci, A. 1965. Category judgment: A range-frequency model. Psychological Review, 72, 407-418. Park, J.-Y., Jeon, S.-Y., O’Mahony, M. and Kim, K.-O. 2004. Induction of scaling errors. Journal of Sensory Studies, 19, 261-271. Pearce, J. H., Korth, B. and Warren, C. B. 1986. Evaluation of three scaling methods for hedonics. Journal of Sensory Studies, 1, 27-46. Peryam. D. 1989. Reflections. In: Sensory Evaluation. In Celebration of our Beginnings. American Society for Testing and Materials, Philadelphia, pp. 21-30. Peryam, D. R. and Girardot, N. F. 1952. Advanced taste-test method. Food Engineering, 24, 58-61, 194. Piggot, J. R. and Harper, R. 1975. Ratio scales and category scales for odour intensity. Chemical Senses and Flavour, 1, 307-316. Pokor´ny, J., Davídek, J., Prnka, V. and Davídková, E. 1986. Nonparametric evaluation of graphical sensory profiles for the analysis of carbonated beverages. Die Nahrung, 30, 131-139. Poulton, E. C. 1989. Bias in Quantifying Judgments. Lawrence Erlbaum, Hillsdale, NJ. Richardson, L. F. and Ross, J. S. 1930. Loudness and telephone current. Journal of General Psychology, 3, 288-306. Rosenthal, R. 1987. Judgment Studies: Design, Analysis and Meta-Analysis. University Press, Cambridge. Shand, P. J., Hawrysh, Z. J., Hardin, R. T. and Jeremiah, L. E. 1985. Descriptive sensory analysis of beef steaks by category scaling, line scaling and magnitude estimation. Journal of Food Science, 50, 495-499. Schutz, H. G. and Cardello, A. V. 2001.. A labeled affective magnitude (LAM) scale for assessing food liking/disliking. Journal of Sensory Studies, 16, 117-159. Siegel, S. 1956. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, New York. Sriwatanakul, K., Kelvie, W., Lasagna, L., Calimlim, J. F., Wels, O. F. and Mehta, G. 1983. Studies with different types of visual analog scales for measurement of pain. Clinical Pharmacology and Therapeutics, 34, 234-239. Stevens, J. C. and Marks, L. M. 1980. Cross-modality matching functions generated by magnitude estimation. Perception and Psychophysics, 27, 379-389. Stevens, S. S. 1951. Mathematics, measurement and psychophysics. In: S. S. Stevens (ed.), Handbook of Experimental Psychology. Wiley, New York, pp. 1-49.

    177 Stevens, S. S. 1956. The direct estimation of sensory magnitudes-loudness. American Journal of Psychology, 69, 1-25. Stevens, S. S. 1957. On the psychophysical law. Psychological Review, 64, 153-181. Stevens, S. S. 1969. On pdicting exponents for cross-modality matches. Perception and Psychophysics, 6, 251-256. Stevens, S. S. and Galanter, E. H. 1957. Ratio scales and category scales for a dozen perceptual continua. Journal of Experimental Psychology, 54, 377-411. Stoer, N. L. and Lawless, H. T. 1993. Comparison of single product scaling and relative-to-reference scaling in sensory evaluation of dairy products. Journal of Sensory Studies, 8, 257-270. Stone, H., Sidel, J., Oliver, S., Woolsey, A. and Singleton, R. C. 1974. Sensory Evaluation by quantitative descriptive analysis. Food Technology, 28, 24-29, 32, 34. Teghtsoonian, M. 1980. Children’s scales of length and loudness: A developmental application of cross-modal matching. Journal of Experimental Child Psychology, 30, 290-307. Thurstone, L. L. 1927. A law of comparative judgment. Psychological Review, 34, 273-286. Townsend, J. T. and Ashby, F. G. 1980. Measurement scales and statistics: The misconception misconceived. Psychological Bulletin, 96, 394-401. Vickers, Z. M. 1983. Magnitude estimation vs. category scaling of the hedonic quality of food sounds. Journal of Food Science, 48, 1183-1186. Villanueva, N. D. M. and Da Silva, M. A. A. P. 2009. Performance of the nine-point hedonic, hybrid and selfadjusting scales in the generation of internal pference maps. Food Quality and Preference, 20, 1-12. Villanueva, N. D. M., Petenate, A. J., and Da Silva, M. A. A. P. 2005. Comparative performance of the hybrid hedonic scale as compared to the traditional hedonic, self-adjusting and ranking scales. Food Quality and Preference, 16, 691-703. Ward, L. M. 1986. Mixed-modality psychophysical scaling: Double cross-modality matching for “difficult” continua. Perception and Psychophysics, 39, 407-417. Weiss, D. J. 1972. Averaging: an empirical validity criterion for magnitude estimation. Perception and Psychophysics, 12, 385-388. Winakor, G., Kim, C. J. and Wolins, L. 1980. Fabric hand: Tactile sensory assessment. Textile Research Journal, 50, 601-610. Yamaguchi, S. 1967. The synergistic effect of monosodium glutamate and disodium 5 inosinate. Journal of Food Science 32, 473-477.

    Chapter 8

    Time-Intensity Methods

    Contents 8.1 8.2 8.3

    8.4

    8.5

    8.6

    Introduction . . . . . . . . . . . . . . . . . A Brief History . . . . . . . . . . . . . . . Variations on the Method . . . . . . . . . . 8.3.1 Discrete or Discontinuous Sampling . . . 8.3.2 “Continuous” Tracking . . . . . . . . . 8.3.3 Temporal Dominance Techniques . . . . Recommended Procedures . . . . . . . . . . 8.4.1 Steps in Conducting a Time-intensity Study . . . . . . . . . . . . . . . . . 8.4.2 Procedures . . . . . . . . . . . . . . 8.4.3 Recommended Analysis . . . . . . . . Data Analysis Options . . . . . . . . . . . . 8.5.1 General Approaches . . . . . . . . . . 8.5.2 Methods to Construct or Describe Average Curves . . . . . . . . . . . . . . . . 8.5.3 Case Study: Simple Geometric Description . . . . . . . . . . . . . . 8.5.4 Analysis by Principal Components . . . Examples and Applications . . . . . . . . . . 8.6.1 Taste and Flavor Sensation Tracking . . . 8.6.2 Trigeminal and Chemical/Tactile Sensations . . . . . . . . . . . . . . 8.6.3 Taste and Odor Adaptation . . . . . . .

    179 180 182 182 183 184 185 185 186 186 187 187 188 189 192 193 193 194 194

    8.6.4 Texture and Phase Change . . 8.6.5 Flavor Release . . . . . . . 8.6.6 Temporal Aspects of Hedonics 8.7 Issues . . . . . . . . . . . . . . . 8.8 Conclusions . . . . . . . . . . . . References . . . . . . . . . . . . . . .

    . . . . . .

    . . . . . .

    . . . . . .

    . . . . . .

    . . . . . .

    195 195 196 197 198 198

    8.1 Introduction Perception of aroma, taste, flavor, and texture in foods is a dynamic not a static phenomenon. In other words, the perceived intensity of the sensory attributes change from moment to moment. The dynamic nature of food sensations arises from processes of chewing, breathing, salivation, tongue movements, and swallowing (Dijksterhuis, 1996). In the texture profile method for instance, different phases of food breakdown were recognized early on as evidenced by the separation of characteristics into first bite, mastication, and residual phases (Brandt et al., 1963). Wine tasters often discuss

    H.T. Lawless, H. Heymann, Sensory Evaluation of Food, Food Science Text Series, DOI 10.1007/978-1-4419-6488-5_8, © Springer Science+Business Media, LLC 2010

    179

    180

    how a wine “opens in the glass,” recognizing that the flavor will vary as a function of time after opening the bottle and exposing the wine to air. It is widely believed that the consumer acceptability of different intensive sweeteners depends on the similarity of their time profile to that of sucrose. Intensive sweeteners with too long a duration in the mouth may be less pleasant to consumers. Conversely, a chewing gum with longlasting flavor or a wine with a “long finish” may be desirable. These examples demonstrate how the time profile of a food or beverage can be an important aspect of its sensory appeal. The common methods of sensory scaling ask the panelists to rate the perceived intensity of the sensation by giving a single (uni-point) rating. This task requires that the panelists must “time-average” or integrate any changing sensations or to estimate only the peak intensity in order to provide the single intensity value that is required. Such a single value may miss some important information. It is possible, for example, for two products to have the same or similar time-averaged profiles or descriptive specifications, but differ in the order in which different flavors occur or when they reach their peak intensities. Time-intensity (TI) methods provide panelists with the opportunity to scale their perceived sensations over time. When multiple attributes are tracked, the profile of a complex food flavor or texture may show differences between products that change across time after a product is first tasted, smelled, or felt. For most sensations the perceived intensity increases and eventually decreases but for some, like perceived toughness of meat, the sensations may only decrease as a function of time. For perceived melting, the sensation may only increase until a completely melted state is reached. When performing a TI study the sensory specialist can obtain a wealth of detailed information such as the following for each sample: the maximum intensity perceived, the time to maximum intensity, the rate and shape of the increase in intensity to the maximum point, the rate and shape of the decrease in intensity to half-maximal intensity and to the extinction point, and the total duration of the sensation. Some of the common time-intensity parameters are illustrated in Fig. 8.1. The additional information derived from time-intensity methods is especially useful when studying sweeteners or products like chewing gums and hand lotions that have a distinctive time profile.

    8

    Time-Intensity Methods

    INTENSITY Other Parameters: DUR, Total duration = Tend – Tstart AUC, Area under the curve

    TIME

    Fig. 8.1 Example of a time-intensity curve and common curve parameters extracted from the record.

    The remainder of this chapter will be devoted to an overview of the history and applications of this method, as well as recommended procedures and analyses. For the student who wants only the basic information, the following sections are key: variations on the method (Section 8.3), steps and recommended procedures (Section 8.4), data analysis options (Section 8.5), and conclusions (8.8).

    8.2 A Brief History Holway and Hurvich (1937) published an early report of tracking taste intensity over time. They had their subjects trace a curve to repsent the sensations from a drop of either 0.5 or 1.0 M NaCl placed on the anterior tongue surface over 10 s. They noted several general effects that were later confirmed as common trends in other studies. The higher concentration led to a higher peak intensity, but the peak occurred later, in spite of a steeper rising slope. Most importantly they noted that taste intensity was not strictly a function of concentration: “while the concentration originally placed on the tongue is ‘fixed,’ the intensity varies in a definite manner from moment to moment. Saline intensity depends on time as well as concentration.” A review of studies of temporal factors in taste is found in Halpern (1991). A review of TI studies of the 1980s and early 1990s can be found in Cliff and Heymann (1993b). Sjostrom (1954) and Jellinek (1964) also made early attempts to quantify the temporal response to perceived sensory intensities. These authors asked their panelists to indicate their perceived bitterness of beer at 1 s intervals on a ballot, using a clock to indicate time.

    8.2 A Brief History

    They then constructed TI curves by plotting the x-y coordinates (time on the x-axis and perceived intensity on the y-axis) on graph paper. Once panelists had some experience with the method it was possible to ask them simultaneously to rate the perceived intensities of two different attributes at 1 s intervals. Neilson (1957) in an attempt to make the production of the TI curves easier, asked panelists to indicate perceived bitterness directly on graph paper at 2 s timed intervals. The clock could be distracting to the panelists and thus Meiselman (1968), studying taste adaptation and McNulty and Moskowitz (1974), evaluating oilin-water emulsions, improved the TI methodology by eliminating the clock. These authors used audible cues to tell the panelists when to enter perceived intensities on a ballot, placing the timekeeping demands on the experimenter rather than the participant. Larson-Powers and Pangborn (1978), in another attempt to eliminate the distractions of the clock or audible cues, employed a moving strip-chart recorder equipped with a foot pedal to start and stop the movement of the chart. Panelists recorded their responses to the perceived sweetness in beverages and gelatins sweetened with sucrose or synthetic sweeteners, by moving a pen along the cutter bar of the strip-chart recorder. The cutter bar was labeled with an unstructured line scale, from none to extreme. A cardboard cover was placed over the moving chart paper to pvent the panelists from watching the evolving curves and thus pventing them from using any visual cues to bias their responses. A similar setup was independently developed and used at the General Foods Technical Center in 1977 to track sweetness intensity (Lawless and Skinner, 1979). In this apparatus, the actual pen carriage of the chart recorder was grasped by the subject, eliminating the need for them to position a pen; also the moving chart was obscured by a line scale with a pointer attached to the pen carriage. In yet another laboratory at about the same time, Birch and Munton (1981) developed the “SMURF” version of TI scaling (short for “Sensory Measuring Unit for Recording Flux”). In the SMURF apparatus, the subject turned a knob graded from 1 to 10, and this potentiometer fed a variable signal to a strip-chart recorder out of sight of the panelist. The use of strip-chart recorders provided the first continuous TI data-collection methods and freed the panelists from any distractions caused by a clock or auditory signal. However, the methods required a fair degree of mental and physical

    181

    coordination by the participants. For example, in the Larson-Powers setup, the strip-chart recorder required the panelists to use a foot pedal to run the chart recorder, to place the sample in the mouth, and to move the pen to indicate the perceived intensity. Not all panelists were suitably coordinated and some could not do the evaluation. Although the strip-chart recorder made continuous evaluation of perceived intensities possible, the TI curves had to be digitized manually, which was extremely time consuming. The opportunity to use computers to time sample an analog voltage signal quite naturally led to online data collection to escape the problem of manual measurement of TI curves. To the best of our knowledge, the first computerized system was developed at the US Army Natick Food Laboratories in 1979 to measure bitter taste adaptation. It employed an electric sensor in the spout just above the subject’s tongue in order to determine the actual onset of stimulus arrival. Subthreshold amounts of NaCl were added to the stimulus and thus created a conductance change as the flow changed from the pliminary water rinse to the stimulus interface in the tube. A special circuit was designed to detect the conductance change and to connect the response knob to the visual line scale. Like the SMURF apparatus developed by Birch and Munton, the subject turned a knob controlling a variable resistor. The output of this potentiometer moved a pointer on a line scale for visual feedback while a parallel analog signal was sent to an analog-to-digital converter and then to the computer. The programming was done using FORTRAN subroutines on a popular lab computer of that era. The entire system is shown in Fig. 8.2. The appearance of desktop computers led to an explosion in the use of TI methodology in the 1980s and 1990s. A number of thesis research projects from U.C. Davis served as useful demonstrations of the method (Cliff, 1987; Dacanay, 1990; Rine, 1987) and the method was championed by Pangborn and coworkers (e.g., Lee and Pangborn, 1986). Several scientists (Barylko-Pikielna et al., 1990; Cliff, 1987; Guinard et al., 1985; Janusz et al., 1991; Lawless, 1980; Lee, 1985; Rine, 1987; Yoshida, 1986) developed computerized TI systems using a variety of hardware and software products. Computerized TI systems are now commercially available as part of data-collection software ensembles, greatly enhancing the ease and availability of TI data collection and data processing.

    182

    8

    Time-Intensity Methods

    Fig. 8.2 An early computerized system for time-intensity scaling used for tracking bitterness adaptation in a flow system for stimulating the anterior tongue. Heavy lines indicate the flow of stimulus solution, solid lines the flow of information, and dashed lines the experimenter-driven process control. Stimulus arrival at the tongue was monitored by conductivity sensors fitted into the glass tube just above the subject’s tongue. The subjects’

    responses change on a line scale while the experimenter could view the conductivity and response voltage outputs on the computer’s display screen, which simultaneously were output to a data file. The system was programmed in FORTRAN subroutines controlling the clock sampling rate and analog-to-digital conversion routine. From Lawless and Clark (1992), reprinted with permission.

    However, despite the availability of computerized systems, some research was still conducted using the simple time cueing at discrete intervals (e.g., Lee and Lawless, 1991; Pionnier et al., 2004), and the semi-manual strip-chart recorder method (Ott et al., 1991; Robichaud and Noble, 1990). A discussion of some common applications of TI methods is given in Section 8.6.

    oldest approach is simply to ask the panelists to rate the intensity of sensation during different phases of consuming a food. This is particularly applicable to texture which may be evaluated in phases such as initial bite, first chew, mastication, and residual. An example of time pision during the texture evaluation of a baked product is shown in Table 8.1. When using a descriptive panel, it may be useful to have residual flavor or mouthfeel sensations rated at a few small intervals, e.g., every 30 s for 2 min, or immediately after tasting and then again after expectoration. For an example of this approach used with hot pepper “burn” see Stevens and Lawless (1986). Each measurement is then treated like a separate descriptive attribute and analyzed as a separate variable, with little or no attempt to reconstruct a time-connected record like the TI curve shown in Fig. 8.1. For researchers

    8.3 Variations on the Method 8.3.1 Discrete or Discontinuous Sampling The sensory scientist has several options for collecting time-dependent sensory data. The methods for timerelated scaling can be pided into four groups. The

    8.3 Variations on the Method

    183

    Table 8.1 Texture attributes at different phases of descriptive analysis Phase Attributes Surface

    First bite

    First chew Chew down

    Residual

    Roughness Particles Dryness Fracturability Hardness Particle size Denseness Uniformity of chew Moisture absorption Cohesiveness of mass Toothpacking Grittiness Oiliness Particles Chalky

    interested in some simple aspect like “strength of bitter aftertaste” this method may suffice. Another related approach is to ask for repeated ratings of a single or just a few attributes at repeated smaller time intervals, usually cued by the panel leader or experimenter. These ratings are then connected and graphed on time axis. This is a simple procedure that can be used to track changes in the intensity of a flavor or texture attribute and requires no special equipment other than a stopwatch or other timing device. The panel must be trained to rate their sensations upon the time cue and to move rapidly through the list of attributes. The cue may be given verbally or on a computer screen. It is not known how many attributes can be rated in this way, but with faster time cueing and shorter intervals, obviously fewer attributes may be rated. This method also requires some faith in the assumption that the attributes are being rated close to the actual time when the cue is given. The accuracy with which panelists can do this is unknown, but given that there is a reaction time delay in any perceptual judgment, there must be some inherent error or delayrelated variance built into the procedure. An example of the repeated, discrete time interval method with verbal cueing and multiple attributes can be found in studies of sweetener mixtures (e.g., Ayya and Lawless, 1992) and astringency (Lee and Lawless, 1991). The time record is treated as a connected series and time is analyzed as one factor (i.e., one independent variable) in the statistical analysis.

    Word anchors Smooth-rough None-many Oily-dry Crumbly-brittle Soft-hard Small-large Airy-dense Even-uneven None-much Loose-cohesive None-much None-much Dry-oily None-much Not chalky-very chalky

    184

    trained to move a mouse diagonally and a visual scale included pointers on both horizontal and vertical scales to repsent the intensity of the inpidual attributes. With a slowly changing product like chewing gum, with a sampling time that is not too frequent (every 9-15 s in this case), the technique would seem to be within the capabilities of human observers to either rapidly shift their attention or to respond to the overall pattern of the combined flavors. However, as currently used, most TI tracking methods must repeat the evaluation in order to track additional attributes. Ideally, this could lead to a composite profile of all the dynamic flavor and texture attributes in a product and how they changed across time. Such an approach was proposed by DeRovira (1996), who showed how the descriptive analysis spider-web plot of multiple attributes could be extended into the time dimension to produce a set of TI curves and thus to characterize an entire profile.

    8

    Time-Intensity Methods

    8.4 Recommended Procedures

    attributes are rated in each trial, (2) that it is simple and easy to do, requiring little or no training, and (3) that it provides a picture of enhanced differences relative to the TI records. Because panelists are forced to respond to only one attribute at a time, differences in the temporal profile may be accentuated. However, at the time of this publication, no standard procedure seemed to be agreed upon. The technique requires specialized software to collect the information, but at least one of the major sensory software systems has implemented a TDS option. Attributes are assumed to have a score of zero before they become dominant, and some attributes may never be rated. This would seem to necessarily lead to an incomplete record. Different panelists are contributing at different times to different attributes, so statistical methods for comparison of differences between products using the raw data set are difficult. However, products can be statistically compared using the simplified summary scores (summed intensity by duration measures, but losing time information) or by comparison of the proportion of responders (losing intensity information) as in Labbe et al. (2009). Qualitative comparisons can be made from inspecting the curves, such as “this product is initially sweet, then becoming more astringent, compared to product X which is initially sour, then fruity.” Is information provided by TDS and by traditional TI tracking methods different? One study found a strong resemblance of the constructed time curves for both methods, providing virtually redundant information for some attributes (Le Reverend et al., 2008). In another study, correlations with TI parameters were high for intensity maxima versus dominance proportion maxima, as might be expected since a higher number of persons finding an attribute dominant should be related to the mean intensity in TI. Correlations with other time-dependent parameters such as time to Imax (Tmax ) and duration measures were low, due to the different information collected in TDS and the limited attention to one attribute at a time (Pineau et al., 2009).

    8.4 Recommended Procedures 8.4.1 Steps in Conducting a Time-intensity Study The steps in conducting a time-intensity study are similar to those in setting up a descriptive analysis procedure. They are listed in Table 8.2. The first

    185 Table 8.2 Steps in conducting a time-intensity study 1. Determine project objectives: Is TI the right method? 2. Determine critical attributes to be rated. 3. Establish products to be used with clients/researchers. 4. Choose system and/or TI method for data collection. a. What is the response task? b. What visual feedback is provided to the panelists? 5. Establish statistical analysis and experimental design. a. What parameters are to be compared? b. Are multivariate comparisons needed? 6. Recruit panelists. 7. Conduct training sessions. 8. Check panelist performance. 9. Conduct study. 10. Analyze data and report.

    important question is to establish whether TI methods are appropriate to the experimental research objective. Is this a product with just one or a few critical attributes that are likely to vary in some important way in their time course? Is this difference likely to impact consumer acceptance of the product? What are these critical attributes? Next, the product test set should be determined with the research team, and this will influence the experimental design. The TI method itself must be chosen, e.g., discrete point, continuous tracking, or TDS. The sensory specialist should by this time have an idea of what the data set will look like and what parameters can be extracted from the TI records for statistical comparisons such as intensity maxima, time to maxima, areas under the curve, and total duration. Many of the TI curve parameters are often correlated, so there is little need to analyze more than about ten parameters. Practice is almost always essential. You cannot assume that a person sitting down in a test booth will know what to do with the TI system and feel comfortable with the mouse or other response device. A protocol for training TI panelists was outlined by Peyvieux and Dijksterhuis (2001) and this protocol or similar versions have been widely adopted. It is also wise that some kind of panel checking be done to make sure the panelists are giving reliable data (see Bloom et al., 1995; Peyvieux and Dijksterhuis, 2001) and to examine the reasonableness of their data records. At this time the researchers and statistical staff should also decide how to handle missing data or records that may have artifacts or be incomplete. As in any sensory study, extensive planning may save a lot of headaches and problems, and this is especially true for TI methods.

    186

    8.4.2 Procedures If only a few attributes are going to be evaluated, the continuous tracking methods are appropriate and will provide a lot of information. This usually requires the use of computer-assisted data collection. Many, if not all, of the commercial software packages for sensory evaluation data collection have TI modules. The start and stop commands, sampling rate, and inter-trial intervals can usually be specified. The mouse movement will generally produce some visual feedback such as the motion of a cursor or line indicator along a simple line scale. The display often looks like a vertical or horizontal thermometer with the cursor position clearly indicated by bar or line that rises and falls. The computer record can be treated as raw data for averaging across panelists. However, statistical analysis by simple averaging raises a number of issues (discussed in Section 8.5). A simple approach is to pull characteristic curve parameters off each record for purposes of statistical comparisons such as intensity maximum (Imax ), time to maximum (Tmax ), and area under the curve (AUC). These are sometimes referred to as “scaffolding parameters” as they repsent the fundamental structure of the time records. Statistical comparison of these parameters can provide a clear understanding of how different products are perceived with regard to the onset of sensations, time course of rising and falling sensations, total duration, and total sensory impact of that flavor or textural aspect of the products. If a computer-assisted software package is not available or cannot be programmed, the research can always choose to use the cued/discontinuous method (e.g., with a stopwatch and verbal commands). This may be suitable for products in which multiple attributes must be rated in order to get the full picture. However, given the widespad availability of commercial sensory data-collection systems in major food and consumer product companies, it is likely that a sensory professional will have access to a continuous tracking option. The starting position of the cursor on the visible scale or computer screen should be considered carefully. For most intensity ratings it makes sense to start at the lower end, but for hedonics (like/dislike) the cursor should begin at the neutral point. For meat tenderness or product melting, the track is usually unidirectional, so the cursor should start at “not tender” or

    8

    Time-Intensity Methods

    “tough” for meat and “not melted” for a product that melts. If the cursor is started at the wrong end of a unidirectional tracking situation, a falsely bi-directional record may be obtained due to the initial movement. An outline of panel training for TI studies was illustrated in a case study by Peyvieux and Dijksterhuis (2001) and the sensory specialist should consider using this or a similar method to insure good performance of the panelists. These authors had panelists evaluate flavor and texture components of a complex meat product. The prospective panelists were first introduced to the TI method, and then given practice with basic taste solutions over several sessions. The basic tastes were considered simpler than the complex product and more suitable for initial practice. Panelist consistency was checked and a panelist was considered reliable if they could produce two out of three TI records on the same taste stimulus that did not differ more than 40% of the time. A vertical line scale was used and an important specification was when to move the cursor back to zero (when there was no flavor or when the sample was swallowed for the texture attribute of juiciness). Problems were noted with (1) nontraditional curve shapes such as having no return to zero, (2) poor replication by some panelists, (3) unusable curves due to lack of a landmark such as no Imax . The authors also conducted a traditional profiling (i.e., descriptive analysis) study before the TI evaluations to make sure the attributes were correctly chosen and understood by the panelists. If TI panelists are already chosen from an existing descriptive panel, this step may not be needed. The authors conducted several statistical analyses to check for consistent use of the attributes, to look for oddities in curve shapes by some panelists, and to examine inpidual replicates. Improvements in consistency and evidence of learning and practice were noted.

    8.4.3 Recommended Analysis For purposes of comparing products, the simplest approach is to extract the curve parameters such as Imax , Tmax , AUC, and total duration from each record. Some sensory software systems will generate these measures automatically. Then these curve parameters can be treated like any data points in any sensory evaluation and compared statistically. For three or

    8.5 Data Analysis Options

    more products, analysis would be by ANOVA and then planned comparisons of means (see Appendix C). Means and significance of differences can be reported in graphs or tables for each curve attribute and product. If a time by intensity curve is desired, the curves can be averaged in the time direction by choosing points at specific time intervals. This averaging method is not without its pitfalls; however, a number of alternative methods are given in the next section. An example of how to produce a simplified averaged curve is given below in the case study of the trapezoidal method (Lallemand et al., 1999).

    8.5 Data Analysis Options 8.5.1 General Approaches Two common statistical approaches have been taken to perform hypothesis testing on TI data. Perhaps the most obvious test is simply to treat the raw data at whatever time intervals were sampled as the input data to analyses of variance (ANOVA) (e.g., Lee and Lawless, 1991). This approach results in a very large ANOVA with at least three factors-time, panelists, and the treatments of interest. Time and panelists effects may not be of primary interest but will always show large F-ratios due to the fact that people differ and the sensations change over time. This is not news. Another common pattern is a time-by-treatment interaction since all curves will tend to converge near baseline at later time intervals. This is also to be

    187

    Table 8.3 Parameters extracted from time-intensity curves Parameter Other names Definition Peak intensity Imax , Ipeak Height of highest point on TI record Total duration DUR, Dtotal Time from onset to return to baseline Self-explanatory Area under the curve AUC, Atotal Plateau Dpeak Time difference between reaching maximum and beginning descent Area under plateau Apeak Self-explanatory Area bounded by onset of decline and reaching baseline Area under descending phase Ptotal Rising slope Ri Rate of increase (linear fit) or slope of line from onset to peak intensity Rate of decrease (linear fit) or slope of line from initial declining point to baseline. Declining slope Rf Extinction Time at which curve terminates at baseline Time to reach peak intensity Time to peak Tmax , Tpeak Time to half peak Half-life Time to reach half maximum in decay portion Modified from Lundahl (1992) Other shape parameters are given in Lundahl (1992), based on a half circle of equivalent area under the curve and piding the half circle into rising and falling phase segments

    188

    8

    2006), an effect that is sometimes described as an inpidual “signature.” Examples of inpidual signatures are shown in Fig. 8.3. The causes of these inpidual patterns are unknown but could be attributed to differences in anatomy, differences in physiology such as salivary factors (Fischer et al., 1994), different types of oral manipulation or chewing efficiency (Brown et al., 1994; Zimoch and Gullet, 1997), and inpidual habits of scaling. Some of this information may be lost when analyzing only the extracted parameters.

    INTENSITY

    Judge 1 Judge 2 Judge 3

    0

    20

    40

    60 80 TIME (sec)

    100

    120

    140

    Fig. 8.3 Examples of time-intensity records showing characteristic signatures or shapes. Judge 1 shows a record with multiple plateaus, a common occurrence. Judge 2 shows a smooth and continuous curve. Judge 3 shows a steep rise and fall.

    A third approach is to fit some mathematical model or set of equations to each inpidual record and then use the constants from the model as the data for comparisons of different products (Eilers and Dijksterhuis, 2004; Garrido et al., 2001; Ledauphin et al., 2005, 2006; Wendin et al., 2003). Given the increasing activity and ingenuity in the area of sensometrics, it is likely that such models will continue to be developed. The sensory scientist needs to ask how useful they are in the product testing arena and whether the model fitting is useful in differentiating products. Various approaches to modeling and mathematical description of TI curves are discussed in the next section.

    8.5.2 Methods to Construct or Describe Average Curves The analysis of TI records has produced a sustained interest and response from sensometricians, who have

    Time-Intensity Methods

    8.5 Data Analysis Options

    189

    INTENSITY

    JUDGE 1 JUDGE 2 AVERAGE

    TIME

    Fig. 8.4 Two curves with different peak times, if averaged, can lead to a double-peaked curve that resembles neither original data record.

    style” either by simple visual inspection of the curves or by a clustering analysis or other statistical methods (van Buuren, 1992; Zimoch and Gullet, 1997). Then these subgroups can be analyzed separately. The analysis may proceed using the simple fixed-time averaging of curve heights or one of the other methods described next. An alternative approach is to average in both the intensity and time directions, by setting each inpidual’s maximum of the mean time to maximum across all curves, and then finding mean times to fixed percentages of maximum in the rising and falling phases of each curve. This procedure was originally published by Overbosch et al. (1986) and subsequent modifications were proposed by Liu and MacFie (1990). The steps in the procedure are shown in Fig. 8.5. Briefly, the method proceeds as follows: In the first step, the geometric mean value for the intensity maximum is found. Inpidual curves are multiplicatively scaled to have this Imax value. In the second step, the geometric mean time to Imax is calculated. In the next steps, geometric mean times are calculated for fixed percentage “slices” of each curve, i.e., at fixed percentages of Imax . For example, the rising and falling phases are “sliced” at 95% of Imax and 90% of Imax and the geometric mean time values to reach these heights are found. This procedure avoids the kind of double-peaked curve that can arise from simple averaging of two distinctly different curve shapes as shown in Fig. 8.4. The method results in several desirable properties that do not necessarily occur with simple averaging at fixed times. First, the Imax value from the mean curve is the geometric mean of the Imax of the inpidual curves. Second, the Tmax value from the mean curve is the geometric mean of the Tmax of the inpidual

    curves. Third, the endpoint is the geometric mean of all endpoint times. Fourth, all judges contribute to all segments of the curve. With simple averaging at fixed times, the tail of the curve may have many judges returned to zero and thus the mean is some small value that is a poor repsentation of the data at those points. In statistical terms, the distribution of responses at these later time intervals is positively skewed and left-censored (bound by zero). One approach to this problem is to use the simple median as the measure of central tendency (e.g., Lawless and Skinner, 1979). In this case the summary curve goes to zero when over half the judges go to zero. A second approach is to use statistical techniques designed for estimating measures of central tendency and standard deviations from leftcensored positively skewed data (Owen and DeRouen, 1980). Overbosch’s method works well if all inpidual curves are smoothly rising and falling with no plateaus or multiple peaks and valleys, and all data begin and end at zero. In practice, the data are not so regular. Some judges may begin to fall after the first maximum and then rise again to a second peak. Due to various common errors, the data may not start and end at zero, for example, the record may be truncated within the allowable time of sampling. To accommodate these problems, Liu and MacFie (1990) developed a modification of the above procedure. In their procedure, Imax and four “time landmarks” were averaged, namely starting time, time to maximum, time at which the curve starts to descend from Imax and ending time. The ascending and descending phases of each curve were then pided into about 20 time interval slices. At each time interval, the mean I value is calculated. This method then allows for curves with multiple rising and falling phases and a plateau of maximum intensity that is commonly seen in some judges’ records.

    8.5.3 Case Study: Simple Geometric Description A simple and elegant method for comparing curves and extracting parameters by a geometric approximation was described by Lallemand et al. (1999). The authors used the method with a trained texture panel to evaluate different ice cream formulations. The labor-intensive nature of TI studies was illustrated in the fact that

    190

    8

    JUDGE 1 JUDGE 2

    Time-Intensity Methods

    JUDGE 1 JUDGE 2

    A Imax 1

    B Geo. mean Imax

    Imax 2

    JUDGE 1 JUDGE 2

    Tmax1

    C

    JUDGE 1 JUDGE 2

    D

    Geo. mean Tmax

    Tmax2

    JUDGE 1 JUDGE 2

    Geo. mean 95% Tmax

    T1 @ 95% Imax

    E

    F Geo. mean T @95% Imax

    Geo. mean Tmax,Imax

    T2 @ 95% Imax

    Fig. 8.5 Steps in the data analysis procedure recommended by Overbosch et al. (1986). (a) Two hypothetical time-intensity records from two panelists showing different intensity maxima at different times. (b) The geometric mean value for the intensity maximum is found. Inpidual curves may then by multiplicatively scaled to have this Imax value. (c) The two Tmax values. (d) The geometric mean time to maximum (Tmax ) is calculated.

    (e) Geometric mean times are calculated for fixed percentage “slices” of each curve, i.e., at fixed percentages of Imax . The rising phase is “sliced” at 95% of Imax and the time values determined. A similar value will be determined at 95% of maximum for the falling phase. (f) The geometric mean times at each percent of maximum are plotted to generate the composite curve.

    12 products were evaluated on 8 different attributes in triplicate sessions, requiring about 300 TI curves from each panelist! Texture panelists were given over 20 sessions of training, although only a few of the final sessions were specifically devoted to practice

    with the TI procedure. Obviously, this kind of extensive research program requires a significant time and resource commitment. Data were collected using a computer-assisted rating program, where mouse movement was linked to

    8.5 Data Analysis Options

    191

    the position of a cursor on a 10 cm 10-point scale. The authors noted a number of issues with the data records due to “mouse artifacts” or other problems. These included sudden unintended movement of the mouse leading to false peaks or ratings after the end of the sensation, mouse blockage leading to unusable records, and occasional inaccurate positioning by panelists causing data that did not reflect their actual perceptions. Such ergonomic difficulties are not uncommon in TI studies although they are rarely reported or discussed. Even with this highly trained panel, from 1 to 3% of the records needed to be discarded or manually corrected due to artifacts or inaccuracies. Sensory professionals should not assume that just because they have a computer-assisted TI system, the human factors in mouse and machine interactions will always work smoothly and as planned. Examples of response artifacts are shown in Fig. 8.6. T1 T2

    T3

    False Plateau

    INTENSITY

    “Mouse artifacts”

    TIME (sec)

    Fig. 8.6 Response artifacts in TI records. The solid line shows some perhaps unintended mouse movement (muscle spasm?) near the peak intensity. The dashed line shows a bump in the mouse after sensation ceased and returned to zero. The dotted line illustrates an issue in determining at what point the intensity plateau has ended. The short segment between T1 and T2 may have simply been an adjustment of the mouse after the sudden rise, when the panelist felt they overshot the mark. The actual end of the plateau might more reasonably be considered to occur at T3. (see Lallemand et al. 1999).

    Lallemand and coworkers noted that TI curves often took a shape in which the sensation rose to a plateau near peak intensity for a period during which intensity ratings changed very little and then fell to the baseline. They reasoned that a simple geometric approximation by a trapezoid shape might suffice for extracting curve parameters and finding the area under the curve (not unlike the trapezoidal approximation method used for integration in calculus). In principle, four points could

    be defined that would describe the curve: the onset time, the time at the intensity maximum or beginning of the plateau, the time at which the plateau ended and the decreasing phase began, and the time at which sensation stopped. These landmarks are those originally proposed by Lui and MacFie (1990). In practice, these points turned out to be more difficult to estimate than expected, so some compromises were made. For example, some records would show a gradually decreasing record during the “plateau” and before the segment with a more rapidly falling slope was evident. How much of a decrease would justify the falling phase or conversely, how little of a decrease would be considered still part of the plateau (see Fig. 8.6)? Also, what should one do if the panelist did not return to zero sensation or unintentionally bumped the mouse after reaching zero? In order to solve these issues, the four points were chosen at somewhat interior sections of the curve, namely the times at 5% of the intensity maximum for the onset and endpoint of the trapezoid, and the times at 90% of the intensity maximum for the beginning and end of the plateau. This approximation worked reasonably well, and its application to a hypothetical record is shown in Fig. 8.7. Given the almost 3,000 TI curves in this one study, the trapezoid points were not mapped by hand or by eye, but a special program written to extract the points. However, for smaller experiments it should be quite feasible to do this kind of analysis “by hand” on any collection of graphed records. The establishment of the four trapezoid vertices now allows extraction of the six basic TI curve parameters for statistical analysis (5 and 90% of maximum intensity points, the four times at those points), as well as the intensity maximum from the original record, and derived (secondary) parameters such as rising and falling slopes and the total area under the curve. Note that the total area becomes simply the sum of the two triangles and the rectangle described by the plateau. These are shown in the lower section of Fig. 8.7. A composite trapezoid can be drawn from the averaged points. The utility and validity of the method was illustrated in one sample composite record, showing the fruity flavor intensity from two ice creams differing in fat content. Consistent with what might be expected from the principles of flavor release, the higher fat sample had a slower and more delayed rise to the peak (plateau) but a longer duration. This would be pdicted if the higher fat level was better able to sequester

    192

    8

    Time-Intensity Methods

    with their single-point intensity estimates. Consistent with the latter notion, the profiling scores could be better modeled by a combination of several of the TI parameters. The simplicity and validity of this analysis method suggests that it should find wider application in industrial settings.

    Intensity trapezoidal approxiamtion to T-I curve

    A,D = points at 5% of maximum B,C = points at 90% of maximum

    8.5.4 Analysis by Principal Components Time DERIVED PARAMETERS;

    B

    C

    Rd Ri Ai

    Am

    Ad

    A

    D Time Di

    Dm

    Dd

    Fig. 8.7 The trapezoidal method of Lallemand et al. (1999) for assessing curve parameters on TI records. The upper panel shows the basic scheme in which four points are found when the initial 5% of the intensity maximum (Imax ) occurs, when 90% of Imax is first reached on the ascending segment, when the plateau is finished at 90% of Imax on the descending phase and the endpoint approximation at 5% of Imax on the descending phase. The lower panel shows the derived parameters, namely Ri , Ai , and Di for the rate (slope), area, and duration of the initial rising phase; Am and Dm for the area and duration of the middle plateau section; and Rd , Ad , and Dd for the rate (slope), area, and duration of the falling phase. A total duration can be found from the sum of Di , Dm , and Dd . The total area is given by the sum of the A parameters or by the formula for the area of a trapezoid: Total area = (I90 -I5 ) (2Dm + Di + Dd )/2. (Height times the sum of the two parallel segments, then pided by 2).

    a lipophilic or nonpolar flavor compound and thus delay the flavor release. They also examined the correlation with a traditional texture descriptive analysis and found very low correlations of inpidual TI parameters with texture profiling mean scores. This would be expected if the TI parameters were contributing unique information or if the texture profilers were integrating a number of time-dependent events in coming up

    Another analysis uses principal components analysis (PCA, discussed in Chapter 18) (van Buuren, 1992). Briefly, PCA is a statistical method that “bundles” groups of correlated measurements and substitutes a new variable (a factor or principal component) in place of the original variables, thus simplifying the picture. In studying the time-intensity curves for bitterness or different brands of lager beer, van Buuren noticed that inpiduals once again produced their own characteristic “style” of curve shape. Most people showed a classic TI curve shape, but some subjects were classified as “slow starters,” with a delayed peak and some showed a tendency to persist and not come back down to baseline within the test period. Submission of the data to PCA allowed the extraction of a “principal curve” which captured the majority trend. This showed a peaked TI curve and a gradual return to baseline. The second principal curve captured the shape of the minority trends, with slow onset, a broad peak and slow decline without reaching baseline. The principal curves were thus able to extract judge trends and provide a cleaned-up view of the primary shape of the combined data (Zimoch and Gullet, 1997). Although a PCA program may extract a number of principal components, not all may be practically meaningful (for an example, see Reinbach et al., 2009), and the user should examine each one for the story it tells. Reasonable questions are whether the component reflects something important relative to the simple TI curve parameters, and whether it shows any patterns related to inpidual differences among panelists. Dijksterhuis explored the PCA approach in greater detail (Dijksterhuis, 1993; Dijksterhuis and van den Broek, 1995; Dijksterhuis et al., 1994). Dijksterhuis (1993) noted that the PCA method as applied by van Buuren was not discriminating of different bitter stimuli. An alternative approach was “non-centered PCA” in which curve height information was retained during

    8.6 Examples and Applications

    data processing, rather than normalizing curves to a common scale. The non-centered approach works on the raw data matrix. Stimuli or treatments were better distinguished. The first principal curve tends to look like the simple average, while the second principal curve contains rate information such as accelerations or inflection points (Dijksterhuis et al., 1994). This could be potentially useful information in differentiating the subtle patterns of TI curves for different flavors. The PCA approach also involves the possibility of generating weights for different assessors that indicate the degree to which they contribute to the different principal curves. This could be an important tool for differentiating outliers in the data or panelists with highly unusual TI signatures (Peyvieux and Dijksterhuis, 2001).

    8.6 Examples and Applications A growing number of studies have used TI methods for the evaluations of flavor, texture, flavor release, hedonics, and basic studies of the chemical senses. A short review of these studies follows in this section, although the reader is cautioned that the list is not exhaustive. We have cited a few of the older studies to give credit to the pioneers of this field as well as some of the newer applications. The examples are meant to show the range of sensory studies for which TI methods are suitable.

    8.6.1 Taste and Flavor Sensation Tracking A common application of continuous time-intensity scaling is tracking the sensation rise and decay from important flavor ingredients, such as sweeteners (Swartz, 1980). An early study of Jellinek and Pangborn reported that addition of salt to sucrose extended the time-intensity curve and made the taste “more rounded” in their words (Jellinek, 1964). One of the salient characteristics of many intensive or non-carbohydrate sweeteners is their lingering taste that is different from that of sucrose. Time-intensity tracking of sweet tastes was an active area of study (Dubois and Lee, 1983; Larson-Powers and Pangborn, 1978; Lawless and Skinner, 1979; Yoshida, 1986) and

    193

    194

    8

    Cliff and Noble, 1990; Matysiak and Noble, 1991). Using these techniques, enhancement of sweetness by fruity volatiles has been observed, and differences in the interactions were seen for different sweetening agents.

    8.6.2 Trigeminal and Chemical/Tactile Sensations Reactions to other chemical stimuli affecting irritation or tactile effects in the mouth have been a fertile area for time-related sensory measurements. Compounds such as menthol produce extended flavor sensations and the time course is concentration dependent (Dacanay, 1990; Gwartney and Heymann, 1995). A large number of studies of the burning sensation from hot pepper compounds have used time-intensity scaling, using repeated ratings at discrete time intervals as well as continuous tracking (Cliff and Heymann, 1993a; Green, 1989; Green and Lawless, 1991; Lawless and Stevens, 1988; Stevens and Lawless, 1986). Given the slow onset and extended time course of the sensations induced by even a single sample of a food containing hot pepper, this is a highly appropriate application for time-related judgments. The repeated ingestion paradigm has also been used to study the short- and long-term desensitization to irritants such as capsaicin and zingerone (Prescott and Stevenson, 1996). The temporal profile of different irritative spice compounds is an important point of qualitative differentiation (Cliff and Heymann, 1992). Reinbach and colleagues used TI tracking to study the oral heat from capsaicin in various meat products (Reinbach et al., 2007, 2009) as well as the interactions of oral burn with temperature (see also Baron and Penfield, 1996). When examining the decay curves following different pepper compounds, different time courses can be fit by different decay constants in a simple exponential curve of the form R = Ro e−kt

    Time-Intensity Methods

    from peak during the decay portion of the time curve (Lawless, 1984). Another chemically induced tactile or feeling factor in the mouth that has been studied by TI methods is astringency. Continuous tracking has been applied to the astringency sensation over repeated ingestions (Guinard et al., 1986). Continuous tracking can provide a clear record of how sensations change during multiple ingestions, and how flavors may build as subsequent sips or tastings add greater sensations on an existing background. An example of this is shown in Fig. 8.8 where the saw-tooth curve shows an increase and builds astringency over repeated ingestions (Guinard et al., 1986). Astringency has also been studied using repeated discrete-point scaling. For example, Lawless and coworkers were able to show differences in the time profiles of some sensory sub-qualities related to astringency, namely dryness, roughness in the mouth, and puckery tightening sensations (Lawless et al., 1994; Lee and Lawless, 1991) depending on the astringent materials being evaluated.

    Fig. 8.8 Continuous tracking record with multiple ingestions producing a “sawtooth” curve record. The abscissa shows the time axis in seconds and the ordinate the mean astringent intensity from 24 judges. The dashed curve is from a 15 ml sample of a base wine with 500 gm/l added tannin and the solid curve from the base wine with no added tannin. Sample intake and expectoration are indicated by stars and arrows, respectively. From Guinard et al. (1986), reprinted with permission of the American Society of Enology and Viticulture.

    (8.1)

    or

    8.6.3 Taste and Odor Adaptation ln R = ln Ro − kt

    (8.2)

    where Ro is the peak intensity and t is time, k is the value determining how rapid the sensation falls off

    The measurement of a flavor sensation by tracking over time has a close parallel in studies of taste and odor adaptation (Cain, 1974; Lawless and Skinner,

    8.6 Examples and Applications

    1979; O’Mahony and Wong, 1989). Adaptation may be defined as the decrease in responsiveness of a sensory system to conditions of constant stimulation providing a changing “zero point” (O’Mahony, 1986). Adaptation studies have sometimes used discrete, single bursts of stimulation (e.g., Meiselman and Halpern, 1973). An early study of adaptation used a flowing system through the subject’s entire mouth using pipes inserted through a dental impssion material that was held in the teeth (Abrahams et al., 1937). Disappearance of salt taste was achieved in under 30 s for a 5% salt solution, although higher concentrations induced painful sensations over time that were confusing to the subjects and sometimes masked the taste sensation. By flowing a continuous stream over a section of the subject’s tongue or stabilizing the stimulus with wet filter paper, taste often disappears in under a few minutes (Gent, 1979; Gent and McBurney, 1978; Kroeze, 1979; McBurney, 1966). Concentrations above the adapting level are perceived as having the characteristic taste of that substance, e.g., salty for NaCl. Concentrations below that level, to which pure water is the limiting case, take on other tastes so that water after NaCl, for example, is sour-bitter (McBurney, and Shick, 1971). Under other conditions adaptation may be incomplete (Dubose et al., 1977; Lawless and Skinner, 1979; Meiselman and Dubose, 1976). Pulsed flow or intermittent stimulation causes adaptation to be much less complete or absent entirely (Meiselman and Halpern, 1973).

    8.6.4 Texture and Phase Change Tactile features of foods and consumer products have been evaluated using time-related measurements. Phase change is an important feature of many products that undergo melting when eaten. These include both frozen products like ice cream and other dairy desserts and fatty products with melting points near body temperature such as chocolate. Using the chartrecording method for time-intensity tracking, Moore and Shoemaker (1981) evaluated the degree of coldness, iciness, and sensory viscosity of ice cream with different degrees of added carboxymethyl cellulose. The added carbohydrate shifted the time for peak intensity of iciness and extended the sensations of coldness on the tongue. Moore and Shoemaker also

    195

    8.6.5 Flavor Release A potentially fertile area for the application of timerelated sensory measurements is in flavor release from foods during eating. Not only are texture changes

    196

    8

    Time-Intensity Methods

    cheese systems (Pionnier et al., 2004) using a discrete point TI method.

    8.6.6 Temporal Aspects of Hedonics Since the pleasantness or appeal of a sensory characteristic is largely dependent on its intensity level, it is not surprising that one’s hedonic reaction to a product might shift over time as the strength of a flavor waxes and wanes. Time-related shifts in food likes and dislikes are well known. In the phenomenon known as alliesthesia, our liking for a food depends a lot on whether we are hungry or replete (Cabanac, 1971). The delightful lobster dinner we enjoyed last night may not look quite so appealing as leftovers at lunchtime the next day. Wine tasters may speak of a wine that is “closed in the glass, open on the palate, and having a long finish.” Accompanying such a description is the implicit message that this particular wine got better over the course of the sensory experience. Given the shorter time span of flavor and texture sensations in the mouth, we can ask whether there are shifts in liking and disliking. This has been measured in several studies. Taylor and Pangborn (1990) examined liking for chocolate milk with varying degrees of milk fat. Different inpidual trends were observed in liking for different concentrations, and this affected the degree of liking expssed over time. Another example of hedonic TI scaling was in a study of the liking/disliking for the burning oral sensation from chili (hot ) pepper (Rozin et al., 1982). They found different patterns of temporal shifting in liking as the burn rose and declined. Some subjects liked the burn at all time intervals, some disliked the burn at all time intervals, and some shifted across neutrality as strong burns became more tolerable. This method was revisited by Veldhuizen et al. (2006) who used a simple bipolar line scale for pleasantness and had subjects evaluated both intensity and hedonic reactions to a citrus beverage flowed over the tongue from a computer-controlled delivery system (see Fig. 8.2 for an early example of this kind of device). Note that with a bipolar hedonic scale, the mouse and cursor positions must begin at the center of the scale and not the lower end as with intensity scaling. The authors found a delayed pleasantness response compared to the intensity tracking, a similar time to maximum, but an unexpectedly quicker offset of response for pleasantness tracking. Some panelists

    8.7 Issues

    produced a double-peaked pleasantness response, as the sensation could rise in pleasantness, but then become too intense, but become more pleasant again as adaptation set in and the perceived strength decreased.

    8.7 Issues Sensory scientists who wish to use time-intensity methods for any particular study need to weigh the potential for obtaining actionable information against the cost and time involved in collecting these data. Some orientation or training is required (Peyvieux and Dijksterhuis, 2001) and in some published studies, the training and practice is quite extensive. For example, Zimoch and Gullet (1997) trained their meat texture panel for 12 h. Panelists must be trained to use the response device and sufficiently practiced to feel comfortable with the requirements of the task in terms of maintaining focused attention to momentary sensation changes. With the use of online data collection the tabulation and processing of information is generally not very labor intensive; but without computer-assisted collection the time involved can be enormous. Even with computerized systems, the data collection is not foolproof. Responses may be truncated or fail to start at zero in some records (Liu and MacFie, 1990; McGowan and Lee, 2006) making automatic averaging of records infeasible. In one study of melting behavior (Lawless et al., 1996), some subjects mistakenly returned the indicating cursor to zero as soon as the product was completely melted, instead of leaving the cursor on maximum, producing truncated records. Such unexpected events remind us to never assume that panelists are doing what you think they should be doing. A fundamental issue is information gain. In the case where changes in duration are observed at equal maximum intensities, it can be argued that the traditional scaling might have missed important sensory differences. For example, TI can capture information such as when the TI curves cross over, e.g., the interesting case when a product with a lower peak intensity has a longer duration (e.g., Lallemand et al., 1999; Lawless et al., 1996). However, this pattern is not often seen. Usually products with stronger peak height have longer durations. In general there is a lot of redundant information in TI parameters. Lundahl (1992) studied the correlation of 15 TI parameters associated with TI

    197

    curves’ shapes, sizes, and rates of change. Curve size parameters were highly correlated and usually loaded on the first principal component of a PCA, capturing most of the variance (see also Cliff and Noble, 1990). Curve size parameters, including peak height, were correlated with simple category ratings of the same beverages. So an open question for the sensory scientist is whether there is any unique information in the TI parameters extracted from the records and whether there is information gain over what would be provided by more simple direct scaling using a single intensity rating. A potential problem in time-intensity methods is that factors affecting response behavior are not well understood. In TI measurements, there are dynamic physical processes (chewing, salivary dilution) leading to changes in the stimulus and resulting sensations (Fischer et al., 1994). A second group of processes concerns how the participant translates the conscious experience into an overt response, including a decision mechanism and motoric activation (Dijksterhuis, 1996). The notion that TI methods provide a direct link from the tongue of the subject to the hand moving the mouse is a fantasy. Even in continuous tracking, there must be some decision process involved. There is no information as to how often a panelist in the continuous procedure reflects upon the sensation and decides to change the position of the response device. Decisions are probably not continuous even though some records from some subjects may look like smooth curves. An indication that response tendencies are important is when the conditions of stimulation are held constant, but the response task changes. For example, using the graphic chart-recorder method, Lawless and Skinner (1979) found median durations for sucrose intensity that were 15-35% shorter than the same stimuli rated using repeated category ratings. Why would the different rating methods produce apparently different durations? Very different patterns may be observed when taste quality and intensity are tracked. Halpern (1991) found that tracked taste quality of 2 mM sodium saccharin had a delayed onset (by 400 ms) compared with tracked intensity. This might be understandable from the point of requiring a more complex decision process in the case of tracking intensity. However, it still alerts us to the fact that the behavior probably trails the actual experience by some unknown amount. What is more surprising in Halpern’s data is that tracked quality also stopped well before tracked intensity (by

    198

    600 ms). Can it be possible that subjects are still experiencing a taste of some definable intensity and yet the quality has disappeared? Or is there another response generation process at work in this task? A third area where potential response biases can operate is in contextual effects. Clark and Lawless (1994) showed that common contextual affects like successive contrast also operated with TI methods, as they do with other scaling tasks. Also, some ratings could be enhanced when a limited number of scales were used by subjects. As observed in singlepoint scaling, enhancement of sweetness by fruity flavors tends to occur when only sweetness is rated. When the fruity flavor is also rated, the sweetness enhancement often disappears (Frank et al., 1989), an effect sometimes referred to as halo dumping or simply “dumping.” Using the discrete-point version of TI scaling, so that multiple attributes could be rated, Clark and Lawless showed a similar effect. This is potentially troublesome for the continuous tracking methods, since they often limit subjects to responding to only one attribute at a time. This may explain in part why sweetness enhancement by flavors can occur so readily in TI studies (e.g., Matysiak and Noble, 1991). A final concern is the question of whether the bounded response scales often used in TI measurement produces any compssion of the differences among products. In analog tracking tasks, there is a limit as to how far the joystick, mouse, lever, dial, or other response device can be moved. With some practice, judges learn not to bump into the top. Yet the very nature of the tracking response encourages judges to sweep a wide range of the response scale. If this were done on every trial, it would tend to attenuate the differences in maximum tracked intensity between products. As an example, Overbosch et al. (1986, see Fig. 2) showed curves for pentanone where doubling the concentration changed peak heights by only about 8%. A similar sort of compssion is visible in Lawless and Skinner’s (1979) data for sucrose, compared to the psychophysical data in the literature.

    8.8 Conclusions In most cases TI parameters show similar statistical differentiation as compared to traditional scales, but this is not universally the case (e.g., Moore and Shoemaker, 1981).

    8

    Time-Intensity Methods

    Many sensory evaluation researchers have supported increased application of time-intensity measurements for characterization of flavor and texture sensations. In particular, the method was championed by Lee and Pangborn, who argued that the methods provide detailed information not available from single estimates of sensation intensity (Lee, 1989; Lee and Pangborn, 1986). TI methods can provide rate-related, duration, and intensity information not available from traditional scaling. However, the utility of the methods must be weighed against the enhanced cost and complexity in data collection and analysis. In deciding whether to apply TI methods over conventional scaling, the sensory scientist should consider the following criteria: (1) Is the attribute or system being studied known to change over time? Simply eating the food can often settle this issue; in many cases it is obvious. (2) Will the products differ in sensory time course as a function of ingredients, processing, packaging, or other variables of interest? (3) Will the time variation occur in such a way that it will probably not be captured by direct single ratings? (4) Is some aspect of the temporal profile likely to be related to consumer acceptability? (5) Does the added information provided by the technique outweigh any additional costs or time delays in panel training, data acquisition, and data analysis? Obviously, when more answers are positive on these criteria, a stronger case can be made for choosing a TI method from the available set of sensory evaluation tools.

    References Abrahams, H., Krakauer, D. and Dallenbach, K. M. 1937. Gustatory adaptation to salt. American Journal of Psychology, 49, 462-469. Ayya, N. and Lawless, H. T. 1992. Qualitative and quantitative evaluation of high-intensity sweeteners and sweetener mixtures. Chemical Senses, 17, 245-259. Baron, R. F. and Penfield, M. P. 1996. Capsaicin heat intensity – concentration, carrier, fat level and serving temperature effects. Journal of Sensory Studies, 11, 295-316. Barylko-Pikielna, N., Mateszewska, I. and Helleman, U. 1990. Effect of salt on time-intensity characteristics of bread. Lebensmittel Wissenschaft und Technologie, 23, 422-426.

    References Birch, G. G. and Munton, S. L. 1981. Use of the “SMURF” in taste analysis. Chemical Senses, 6, 45-52. Bloom, K., Duizer, L. M. and Findlay, C. J. 1995. An objective numerical method of assessing the reliability of time- intensity panelists. Journal of Sensory Studies, 10, 285-294. Bonnans, S. and Noble, A. C. 1993. Effect of sweetener type and of sweetener and acid levels on temporal perception of sweetness, sourness and fruitiness. Chemical Senses, 18, 273-283. Brandt, M. A., Skinner, E. Z. and Coleman, J. A. 1963. Texture profile method. Journal of Food Science, 28, 404-409. Brown, W. E., Landgley, K. R., Martin, A. and MacFie, H. J. 1994. Characterisation of patterns of chewing behavior in human subjects and their influence on texture perception. Journal of Texture Studies, 15, 33-48. Butler, G., Poste, L. M., Mackie, D. A., and Jones, A. 1996. Time-intensity as a tool for the measurement of meat tenderness. Food Quality and Preference, 7, 193-204. Cabanac, M. 1971. Physiological role of pleasure. Science, 173, 1103-1107. Cain, W. S. 1974. Perception of odor intensity and time-course of olfactory adaptation. ASHRAE transactions, 80, 53-75. Clark, C. C. and Lawless, H. T. 1994. Limiting response alternatives in time-intensity scaling: An examination of the Halo-Dumping effect. Chemical Senses, 19, 583-594. Cliff, M. 1987. Temporal perception of sweetness and fruitiness and their interaction in a model system. MS Thesis, University of California, Davis, USA. Cliff, M. and Heymann, H. 1992. Descriptive analysis of oral pungency. Journal of Sensory Studies, 7, 279-290. Cliff, M. and Heymann, H. 1993a. Time-intensity evaluation of oral burn. Journal of Sensory Studies, 8, 201-211. Cliff, M. and Heymann, 1993b. Development and use of time- intensity methodology for sensory evaluation: A review. Food Research International, 26, 375-385. Cliff, M. and Noble, A. C. 1990. Time-intensity evaluation of sweetness and fruitiness in a model solution. Journal of Food Science, 55, 450-454. Dacanay, L. 1990. Thermal and concentration effects on temporal sensory attributes of L – menthol. M.S. Thesis, University of California, Davis, USA. de Roos, K. B. 1990. Flavor release from chewing gums. In: Y. Bessiere and A. F. Thomas (eds.), Flavour Science and Technology. Wiley, Chichester, pp. 355-362. DeRovira, D. 1996. The dynamic flavor profile method. Food Technology, 50, 55-60. Dijksterhuis, G. 1993. Principal component analysis of time- intensity bitterness curves. Journal of Sensory Studies, 8, 317-328. Dijksterhuis, G. 1996. Time-intensity methodology: Review and pview. Proceedings, COST96 Meeting: Interaction of Food Matrix with Small Ligands Influencing Flavour and Texture, Dijon, France, November 20, 1995. Dijksterhuis, G. and van den Broek, E. 1995. Matching the shape of time-intensity curves. Journal of Sensory Studies, 10, 149-161. Dijksterhuis, G., Flipsen, M. and Punter, P. H. 1994. Principal component analysis of time-intensity data. Food Quality and Preference, 5, 121-127. DuBois, G. E. and Lee, J. F. 1983. A simple technique for the evaluation of temporal taste properties. Chemical Senses, 7, 237-247.

    200 DuBois (eds.), Sweeteners: Discovery, Molecular Design and Chemoreception. ACS Symposium Series #450. American Chemical Society, Washington, DC, pp. 277-289. Jellinek, G. 1964. Introduction to and critical review of modern methods of sensory analysis (odor taste and flavor evaluation) with special emphasis on descriptive analysis. Journal of Nutrition and Dietetics, 1, 219-260. Jellinek, G. 1985. Sensory Evaluation of Food, Theory and Practice. Ellis Horwood, Chichester, England. Kroeze, J. H. A. 1979. Masking and adaptation of sugar sweetness intensity. Physiology and Behavior, 22, 347-351. Kuo, Y.-L., Pangborn, R. M. and Noble, A. C. 1993. Temporal patterns of nasal, oral and retronasal perception of citral and vanillin and interactions of these odorants with selected tastants. International Journal of Food Science and Technology, 28, 127-137. Labbe, D., Schlich, P., Pineau, N., Gilbert, F. and Martin, N. 2009. Temporal dominance of sensations and sensory profiling: A comparative study. Food Quality and Preference, 20, 216-221. Lallemand, M., Giboreau, A., Rytz, A. and Colas, B. 1999. Extracting parameters from time-intensity curves using a trapezoid model: The example of some sensory attributes of ice cream. Journal of Sensory Studies, 14, 387-399. Larson-Powers, N. and Pangborn, R. M. 1978. Paired comparison and time-intensity measurements of the sensory properties of beverages and gelatins containing sucrose or synthetic sweeteners. Journal of Food Science, 43, 41-46. Lawless, H. T. 1980. A computerized system for assessing taste intensity over time. Paper psented at the Chemical Senses and Intake Society, Hartford, CT, April 9, 1980. Lawless, H. T. 1984. Oral chemical irritation: Psychophysical properties. Chemical Senses, 9, 143-155. Lawless, H. T. and Clark, C. C. 1992. Psychological biases in time-intensity scaling. Food Technology, 46(11), 81, 84-86, 90. Lawless, H. T. and Skinner, E. Z. 1979. The duration and perceived intensity of sucrose taste. Perception and Psychophysics, 25, 249-258. Lawless, H. T. and Stevens, D. A. 1988. Responses by humans to oral chemical irritants as a function of locus of stimulation. Perception and Psychophysics, 43, 72-78. Lawless, H. T., Corrigan, C. L. and Lee, C. L. 1994. Interactions of astringent substances. Chemical Senses, 19, 141-154. Lawless, H. T., Tuorila, H., Jouppila, K., Virtanen, P. and Horne, J. 1996. Effects of guar gum and microcrystalline cellulose on sensory and thermal properties of a high fat model food system. Journal of Texture Studies 27, 493-516. Le Reverend, F. M., Hidrio, C., Fernandes, A. and Aubry, V. 2008. Comparison between temporal dominance of sensation and time intensity results. Food Quality and Preference, 19, 174-178. Leach, E. J. and Noble, A. C. 1986. Comparison of bitterness of caffeine and quinine by a time-intensity procedure. Chemical Senses, 11, 339-345. Ledauphin, S., Vigneau, E. and Causeur, D. 2005. Functional approach for the analysis of time intensity curves using B-splines. Journal of Sensory Studies, 20, 285-300. Ledauphin, S., Vigneau, E. and Qannari, E. M. 2006. A procedure for analysis of time intensity curves. Food Quality and Preference, 17, 290-295.

    8

    Time-Intensity Methods

    Lee, C. B. and Lawless, H. T. 1991. Time-course of astringent materials. Chemical Senses, 16, 225-238. Lee, W. E. 1985. Evaluation of time-intensity sensory responses using a personal computer. Journal of Food Science, 50, 1750-1751. Lee, W. E. 1986. A suggested instrumental technique for studying dynamic flavor release from food products. Journal of Food Science, 51, 249-250. Lee, W. E. 1989. Single-point vs. time-intensity sensory measurements: An informational entropy analysis. Journal of Sensory Studies, 4, 19-30. Lee, W. E. and Pangborn, R. M. 1986. Time-intensity: The temporal aspects of sensory perception. Food Technology, 40, 71-78, 82. Liu, Y. H. and MacFie, H. J. H. 1990. Methods for averaging time-intensity curves. Chemical Senses, 15, 471-484. Lundahl, D. S. 1992. Comparing time-intensity to category scales in sensory evaluation. Food Technology, 46(11), 98-103. Lynch, J., Liu, Y.-H., Mela, D. J. and MacFie, H. J. H. 1993. A time-intensity study of the effect of oil mouthcoatings on taste perception. Chemical Senses, 18, 121-129. Matysiak, N. L. and Noble, A. C. 1991. Comparison of temporal perception of fruitiness in model systems sweetened with aspartame, aspartame + acesulfame K blend or sucrose. Journal of Food Science, 65, 823-826. McBurney, D. H. 1966. Magnitude estimation of the taste of sodium chloride after adaptation to sodium chloride. Journal of Experimental Psychology, 72, 869-873. McBurney, D. H. and Shick, T. R. 1971. Taste and water taste of 26 compounds for man. Perception and Psychophysics, 11, 228-232. McGowan, B. A. and Lee, S.-Y. 2006. Comparison of methods to analyze time-intensity curves in a corn zein chewing gum study. Food Quality and Preference 17, 296-306. McNulty, P. B. 1987. Flavour release-elusive and dynamic. In: J. M. V. Blanshard and P. Lillford (eds.), Food Structure and Behavior. Academic, London, pp. 245-258. McNulty, P. B. and Moskowitz, H. R. 1974. Intensity -time curves for flavored oil-in-water emulsions. Journal of Food Science, 39, 55-57. Meiselman, H. L. 1968. Magnitude estimation of the time course of gustatory adaptation. Perception and Psychophysics, 4, 193-196. Meiselman, H. L. and Dubose, C. N. 1976. Failure of instructional set to affect completeness of taste adaptation. Perception and Psychophysics, 19, 226-230. Meiselman, H. L. and Halpern, B. P. 1973. Enhancement of taste intensity through pulsatile stimulation. Physiology and Behavior, 11, 713-716. Moore, L. J. and Shoemaker, C. F. 1981. Sensory textural properties of stabilized ice cream. Journal of Food Science, 46, 399-402, 409. Neilson, A. J. 1957. Time-intensity studies. Drug and Cosmetic Industry, 80, 452-453, 534. O’Keefe, S. F., Resurreccion, A. P., Wilson, L. A. and Murphy, P. A. 1991. Temperature effect on binding of volatile flavor compounds to soy protein in aqueous model systems. Journal of Food Science, 56, 802-806. O’Mahony, M. 1986. Sensory adaptation. Journal of Sensory Studies, 1, 237-257.

    References O’Mahony, M. and Wong, S.-Y. 1989. Time-intensity scaling with judges trained to use a calibrated scale: Adaptation, salty and umami tastes. Journal of Sensory Studies, 3, 217-236. Ott, D. B., Edwards, C. L. and Palmer, S. J. 1991. Perceived taste intensity and duration of nutritive and non-nutritive sweeteners in water using time-intensity (T-I) evaluations. Journal of Food Science, 56, 535-542. Overbosch, P. 1987. Flavour release and perception. In: M. Martens, G. A. Dalen and H. Russwurm (eds.), Flavour Science and Technology. Wiley, New York, pp. 291-300. Overbosch, P., Van den Enden, J. C., and Keur, B. M. 1986. An improved method for measuring perceived intensity/time relationships in human taste and smell. Chemical Senses, 11, 315-338. Owen, W. J. and DeRouen, T. A. 1980. Estimation of the mean for lognormal data containing zeroes and left-censored values, with application to the measurement of worker exposure to air contaminants. Biometrics, 36, 707-719. Pangborn, R. M. and Koyasako, A. 1981. Time-course of viscosity, sweetness and flavor in chocolate desserts. Journal of Texture Studies, 12, 141-150. Pangborn, R. M., Lewis, M. J. and Yamashita, J. F. 1983. Comparison of time-intensity with category scaling of bitterness of iso-alpha-acids in model systems and in beer. Journal of the Institute of Brewing, 89, 349-355. Peyvieux, C. and Dijksterhuis, G. 2001. Training a sensory panel for TI: A case study. Food Quality and Preference, 12, 19-28. Pineau, N., Schlich, P., Cordelle, S., Mathonniere, C., Issanchou, S., Imbert, A., Rogeaux, M., Eteviant, P. and Köster, E. 2009. Temporal dominance of sensations: Construction of the TDS curves and comparison with time- intensity. Food Quality and Preference, 20, 450-455. Pionnier, E., Nicklaus, S., Chabanet, C., Mioche, L., Taylor, A. J., LeQuere, J. L. and Salles, C. 2004. Flavor perception of a model cheese: relationships with oral and physico-chemical parameters. Food Quality and Preference, 15, 843-852. Prescott, J. and Stevenson, R. J. 1996. Psychophysical responses to single and multiple psentations of the oral irritant zingerone: Relationship to frequency of chili consumption. Physiology and Behavior, 60-617-624. Reinbach, H. C., Toft, M. and Møller, P. 2009. Relationship between oral burn and temperature in chili spiced pork patties evaluated by time-intensity. Food Quality and Preference, 20, 42-49. Reinbach, H. C., Meinert, L., Ballabio, D., Aayslyng, M. D., Bredie, W. L. P., Olsen, K. and Møller, P. 2007. Interactions between oral burn, meat flavor and texture in chili spiced pork patties evaluated by time-intensity. Food Quality and Preference, 18, 909-919. Rine, S. D. 1987. Computerized analysis of the sensory properties of peanut butter. M. S. Thesis, University of California, Davis, USA.

    Chapter 9

    Context Effects and Biases in Sensory Judgment

    Abstract Human judgments about a sensation or a product are strongly influenced by items that surround the item of interest, either in space or in time. This chapter shows how judgments can change as a function of the context within which a product is evaluated. Various contextual effects and biases are described and categorized. Some solutions and courses of action to minimize these biases are psented. By such general principles of action as these everything looked at, felt, smelt or heard comes to be located in a more or less definite position relatively to other collateral things either actually psented or only imagined as possibly there. – James (1913, p. 342)

    Contents

    9.6.3 Positional or Order Bias . . . . . . Antidotes . . . . . . . . . . . . . . . . 9.7.1 Avoid or Minimize . . . . . . . . . 9.7.2 Randomization and Counterbalancing 9.7.3 Stabilization and Calibration . . . . 9.7.4 Interptation . . . . . . . . . . . 9.8 Conclusions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . 9.7

    9.1 9.2

    9.3

    9.4

    9.5

    9.6

    Introduction: The Relative Nature of Human Judgment . . . . . . . . . . . . . . . . . . Simple Contrast Effects . . . . . . . . . . . 9.2.1 A Little Theory: Adaptation Level . . . . 9.2.2 Intensity Shifts . . . . . . . . . . . . 9.2.3 Quality Shifts . . . . . . . . . . . . . 9.2.4 Hedonic Shifts . . . . . . . . . . . . 9.2.5 Explanations for Contrast . . . . . . . . Range and Frequency Effects . . . . . . . . . 9.3.1 A Little More Theory: Parducci’s Range and Frequency Principles . . . . . . . . 9.3.2 Range Effects . . . . . . . . . . . . . 9.3.3 Frequency Effects . . . . . . . . . . . Biases . . . . . . . . . . . . . . . . . . . . 9.4.1 Idiosyncratic Scale Usage and Number Bias . . . . . . . . . . . 9.4.2 Poulton’s Classifications . . . . . . . . 9.4.3 Response Range Effects . . . . . . . . 9.4.4 The Centering Bias . . . . . . . . . . Response Correlation and Response Restriction . . . . . . . . . . . . . . . . . 9.5.1 Response Correlation . . . . . . . . . 9.5.2 “Dumping” Effects: Inflation Due to Response Restriction in Profiling . . . 9.5.3 Over-Partitioning . . . . . . . . . . . Classical Psychological Errors and Other Biases . . . . . . . . . . . . . . . . . . . . 9.6.1 Errors in Structured Sequences: Anticipation and Habituation . . . . . . . . . . . . 9.6.2 The Stimulus Error . . . . . . . . . .

    203 206 206 207 207 208 209 210 210 210 211 212 212 213 214 215 216 216 217 218 218 218 219

    . . . . . . . .

    . . . . . . . .

    219 219 219 220 221 222 222 223

    9.1 Introduction: The Relative Nature of Human Judgment This chapter will discuss context effects and common biases that can affect sensory judgments. Context effects are conditions in which the judgment about a product, usually a scaled rating, will shift depending upon factors such as the other products that are evaluated in the same tasting session. A mediocre product evaluated in the context of some poor-quality items may seem very good in comparison. Biases refer to tendencies in judgment in which the response is influenced in some way to be an inaccurate reflection of the actual sensory experience. In magnitude estimation ratings, for example, people have a tendency to use numbers that are multiples of 2, 5, and 10, even though they can use any number or fraction they wish. At the

    H.T. Lawless, H. Heymann, Sensory Evaluation of Food, Food Science Text Series, DOI 10.1007/978-1-4419-6488-5_9, © Springer Science+Business Media, LLC 2010

    203

    204

    end of the chapter, some solutions to these problems are offered, although a sensory scientist should realize that we can never totally eliminate these factors. In fact, they are of interest and deserve study on their own for what they can tell us about human sensory and cognitive processes. An axiom of perceptual psychology has it that humans are very poor absolute measuring instruments but are very good at comparing things. For example, we may have difficulty estimating the exact sweetness level of our coffee, but we have little trouble in telling whether more sugar has been added to make it sweeter. The question arises, if people are prone to making comparisons, how can they give ratings when no comparison is requested or specified? For example, when asked to rate the perceived firmness of a food sample, how do they judge what is firm versus what is soft? Obviously, they must either choose a frame of reference for the range of firmness to be judged or be trained with explicit reference standards to understand what is high and low on the response scale. In other words, they must relate this sensory judgment to other products they have tried. For many items encountered in everyday life, we have established frames of reference based on our experiences. We have no trouble forming an image of a “large mouse running up the trunk of a small elephant” because we have established frames of reference for what constitutes the average mouse and the average elephant. In this case the judgment of large and small is context dependent. Some people would argue that all judgments are relative. This dependence upon a frame of reference in making sensory judgments demonstrates the influence of contextual factors in biasing or changing how products are evaluated. We are always prone to see things against a background or pvious experience and evaluate them accordingly. A 40◦ (Fahrenheit) day in Ithaca, New York, in January seems quite mild against the background of the northeastern American winter. However, the same 40◦ C temperature will feel quite cool on an evening in August in the same location. This principle of frame of reference is the source of many visual illusions, where the same physical stimulus causes very different perceptual impssions, due to the context within which it is embedded. Examples are shown in Fig. 9.1. A simple demonstration of context is the visual afterimage effect that gave rise to Emmert’s law (Boring, 1942). In 1881, Emmert formalized a

    9

    Context Effects and Biases in Sensory Judgment

    A

    B

    C Fig. 9.1 Examples of contextual effects from simple visual illusions. (a) The dumbell version of the Muller-Lyer illusion. (b) The Ebbinghaus illusion. (c) Illusory contours. In this case the contexts induce the perceptions of shapes.

    principle of size constancy based on the following effect: Stare for about 30 s at a brightly illuminated colored paper rectangle (it helps to have a small dot to aid in fixation in the center) about a meter away. Then shift your gaze to a white sheet on the table in front of you. You should see the rectangle afterimage in a complementary color and somewhat smaller in size as compared to the original colored rectangle. Next, shift your gaze to a white wall some distance off. The afterimage will now appear much larger, as the brain finds a fixed visual angle at greater distance to repsent larger physical objects. Since the mind does not immediately recognize that the afterimage is just a creation of the visual sensory system, it projects it at the distance of the surface upon which it is “seen.” The more distant frame of reference, then, demands a larger size perception. The close link between sensory judgments and context psents problems for anyone who wants to view ratings as absolute or comparable across different times, sessions, or settings. Even when the actual sensory impssion of two items is the same, we can shift the frame of reference and change the overt behavior of the person to produce a different response. This problem (or principle of sensory function) was glossed

    9.1 Introduction: The Relative Nature of Human Judgment

    over by early psychophysical scientists. In psychological terms, they used a simple stimulus-response (S-R) model, in which response was considered a direct and unbiased repsentation of sensory experiences. Certain biases were observed, but it was felt that suitable experimental controls could minimize or eliminate them (Birnbaum, 1982; Poulton, 1989). A more modern view is that there are two or three distinct processes contributing to ratings. The first is a psychophysical process by which stimulus energy is translated into physiological events that result in a subjective experience of some sensory intensity. The second, equally important process is the function by which the subjective experience is translated into the observed response, i.e., how the percept is translated onto the rating scale (see Fig. 9.2). Many psychophysical researchers now consider a “judgment function” to

    A)

    Response

    Stimulus

    B) Stimulus

    Response

    Sensation Judgment Process

    Psychophysical Process C) Stimulus

    Sensation

    Psychophysical Process

    ENCODED SENSATION, “PERCEPT”

    Perceptual Process

    Response

    Judgment Process

    Fig. 9.2 Models for sensory-response processes. (a) The simple stimulus-response model of twentieth-century behavioral psychology. (b) Two processes are involved in sensation and response, a psychophysical process and then a response output or a judgment process in which the participant decides what response to give for that sensation. (c) A more complex model in which the sensation may be transformed before the response is generated. It may exist in short-term memory as an encoded percept, different from the sensation. Contextual effects of simultaneous or sequential stimuli can influence the stimulus-response sequence in several ways. Peripheral physiological effects such as adaptation or mixture inhibition may change the transduction process or other early stages of neural processing. Other stimuli may give rise to separate percepts that are integrated into the final response. Contextual factors may also influence the frame of reference that determines how the response output function will be applied. In some models, an additional step allows transformation of the percept into covert responses that are then translated as a separate step into the overt response R.

    205

    be an important part of the sequence from stimulus to response (Anderson, 1974, Birnbaum, 1982, McBride and Anderson, 1990; Schifferstein and Frijters, 1992; Ward, 1987). This process is also sometimes referred to as a response output function. A third intermediate step is the conversion of the raw sensory experience into some kind of encoded percept, one that is available to memory for a short time, before the judgment is made (Fig. 9.2c). Given this framework, there are several points at which stimulus context may influence the sensory process. First, of course, the actual sensation itself may change. Many sensory processes involve interaction effects of simultaneous or sequential influences of multiple items. An item may be perceived differently due to the direct influence of one stimulus upon another that is nearby in time or space. Simultaneous color contrast is an example in color vision and some types of inhibitory mixture interactions and masking in taste and smell are similarly hard wired. Quinine with added salt is less bitter than quinine tasted alone, due to the ways that sodium ions inhibit bitterness transduction. Sensory adaptation weakens the perception of a stimulus because of what has pceded it. So the psychophysical process itself is altered by the milieu in which the stimulus is observed, sometimes because of physical effects (e.g., simple buffering of an acid) or physiological effects (e.g., neural inhibition causing mixture suppssion) in the peripheral sensory mechanisms. A second point of influence is when the context shifts the frame of reference for the response output function. That is, two sensations may have the same subjective intensity under two conditions, but because of the way the observer places them along the response continuum (due to different contexts), they are rated differently. A number of studies have shown that contextual factors such as the distribution of stimuli along the physical continuum affect primarily (although not exclusively) the response output function (Mellers and Birnbaum, 1982, 1983). A third process is sometimes added in which the sensation itself is translated into an implicit response or encoded image that may also be affected by context (Fig. 9.2c). This would provide another opportunity to influence the process if contextual factors affect this encoding step. Contextual change can be viewed as a form of bias. Bias, in this sense, is a process that causes a shift or a change in response to a constant sensation. If one situation is viewed as producing a true

    206

    and accurate response, then the contextual conditions that cause shifts away from this accurate response are “biased.” However, bias need only have a negative connotation if there is a reason to psume that one condition of judgment is more accurate than all others. A broader view is to accept the idea that all judgments are a function of observing conditions and therefore all judgments are biased from one another in different ways. Fortunately, many of these biases and the conditions that cause them are pdictable and well understood, and so they can be eliminated or minimized. At the very least, the sensory practitioner needs to understand how these influences operate so as to know when to expect changes in judgments and ratings. An important endpoint is the realization that few, if any, ratings have any absolute meaning. You cannot say that because a product received a hedonic rating of 7.0 today, it is better than the product that received a rating of 6.5 last week. The context may have changed.

    9.2 Simple Contrast Effects By far the most common effect of sensory context is simple contrast. Any stimulus will be judged as more intense in the psence of a weaker stimulus and as less intense in the psence of a stronger stimulus, all other conditions being equal. This effect is much easier to find and to demonstrate than its opposite, convergence or assimilation. For example, an early sensory worker at the Quartermaster Corp., Kamenetsky (1957), noticed that the acceptability ratings for foods seemed to depend upon what other foods were psented during an evaluation session. Poor foods seemed even worse when pceded by a good sample. Convergence is more difficult to demonstrate, although under some conditions a group of items may seem more similar to each other when they are in the psence of an item that is very different from that group (Zellner et al., 2006).

    9.2.1 A Little Theory: Adaptation Level As we noted above, a 40◦ day in January (in New York) seems a lot warmer than the same temperature in August. These kinds of effects are pdicted Helson’s

    9

    Context Effects and Biases in Sensory Judgment

    theory of adaptation level. Helson (1964) proposed that we take as a frame of reference the average level of stimulation that has pceded the item to be evaluated. The mild temperature in the middle of a hot and humid summer seems a lot more cool and refreshing than is the mild temperature after a period of cold and icy weather. So we refer to our most recent experiences in evaluating the sensory properties of an item. Helson went on to elaborate the theory to include both immediate and distant pdecessors. That is, he appciated the fact that more recent items tend to have a stronger effect on the adaptation level. Of course, mere reference to the mean value of experience is not always sufficient to induce a contrast effect-it is more influential if the mean value comes to be centered near the middle of the response scale, an example of a centering bias, discussed below (Poulton, 1989). The notion of adaptation, a decrease in responsiveness under conditions of constant stimulation, is a major theme in the literature on sensory processes. Physiological adaptation or an adjustment to the ambient level of stimulation is obvious in light/dark adaptation in vision. The thermal and tactile senses also show profound adaptation effects-we become easily adjusted to the ambient room temperature (as long as it is not too extreme) and we become unaware of tactile stimulation from our clothing. So this mean reference level often passes from consciousness or becomes a new baseline from which deviations in the environment become noticeable. Some workers have even suggested that this improves discrimination-that the difference threshold is smallest right around the adaptation level or physiological zero, in keeping with Weber’s law (McBurney, 1966). Examples of adaptation effects are discussed in Chapter 2 for the senses of taste and smell. In the chemical, thermal, and tactile senses, adaptation is quite profound. However, we need not invoke the concept of neural adaptation to a pceding item or a physiological effect to explain all contrast effects. It may be simply that more or less extreme stimuli change our frame of reference or the way in which the stimulus range and response scales are to be mapped onto one another. The general principle of context is that human observers act like measuring instruments that constantly recalibrate themselves to the experienced frame of reference. What we think of as a small horse may depend upon whether the frame of reference includes Clydesdales, Shetland ponies, or tiny phistoric equine species. The

    9.2 Simple Contrast Effects

    207

    following examples show simple effects of context on intensity, sensory quality, and hedonics or acceptability. Most of these examples are cases of perceptual contrast or a shift in judgment away from other stimuli psented in the same session.

    9.2.2 Intensity Shifts Figure 9.3 shows a simple contrast effect of soups with varying salt levels psented in different contexts (Lawless, 1983). The central stimulus in the series was psented either with two lower or two higher concentrations of salt added to a low sodium soup. Ratings of saltiness intensity were made on a simple nine-point category scale. In the lower context, the central soup received a higher rating, and in the higher context, it received a lower rating, analogous to our perception of a mild day in winter (seemingly warmer) versus a mild day in summer (seemingly cooler). Note that the shift is quite dramatic, about two points on the nine-point scale or close to 25% of scale range. A simple classroom demonstration can show a similar shift for the tactile roughness of sandpapers varying in grit size. In the context of a rougher sample, a medium sample will be rated lower than it is in the context of a smoother sample. The effects of simple contrast are not limited to taste and smell.

    Mean Category Rating

    7 6 5

    Low Context High Context

    4

    3

    2

    .12

    .18

    .25

    .35

    .50

    NaCl Concentration (log scale)

    Fig. 9.3 Saltiness ratings of soups with added NaCl. The sample at 0.25 M was evaluated in two contexts, one with higher concentrations and one with lower concentrations. The shift is typical of a simple contrast effect of contrast. Replotted from Lawless (1983). Copyright ASTM, used with permission.

    Contrast effects are not always observed. In some psychophysical work with long series of stimuli, some item-to-item correlations have been observed. The effects of immediately pceding versus remotely pceding stimuli have been measured and a positive correlation among adjacent responses in the series was found. This can be taken as evidence for a type of assimilation, or underestimation of differences (Ward, 1979, 1987; but see also Schifferstein and Frijters, 1992).

    9.2.3 Quality Shifts Visual examples such as color contrast were well known to early psychologists like William James: “Meanwhile it is an undoubted general fact that the psychical effect of incoming currents does depend on what other currents may be simultaneously pouring in. Not only the perceptibility of the object which the current brings before the mind, but the quality of it is changed by the other currents.” (1913, p. 25). A gray line against a yellow background may appear somewhat bluish, and the same line against a blue background may seem more yellowish. Paintings of the renowned artist Josef Albers made excellent use of color contrast. Similar effects can be observed for the chemical senses. During a descriptive panel training period for fragrance evaluation, the terpene aroma compound dihydromyrcenol was psented among a set of woody or pine-like reference materials. The panelists complained that the aroma was too citrus-like to be included among the woody reference materials. However, when the same odor was placed in the context of citrus reference materials, the same panelists claimed that it was far too woody and pine-like to be included among the citrus examples. This contextual shift is shown in Fig. 9.4. In a citrus context, the item is rated as more woody in character than when placed in a woody context. Conversely, ratings for citrus intensity decrease in the citrus context and increase in the woody context. The effect is quite robust and is seen whether or not a rest period is included to undo the potential effects of sensory adaptation. It even occurs when the contextual odor follows the target item and judgments are made after both are experienced (Lawless et al., 1991)! This striking effect is discussed further in Section 9.2.5.

    208

    9

    Context Effects and Biases in Sensory Judgment

    CITRUS CONTEXT

    MEAN INTENSITY RATING, DIHYDROMYRCENOL

    WOODY CONTEXT Citrus/woody scales

    Intensity/pleasantness

    used during exposure

    scales during exposure

    9 8 7 6 5 4 3 2 1 CITRUS RATING WOODY RATING

    CITRUS RATING WOODY RATING

    ODOR CHARACTERISTIC Fig. 9.4 Odor quality contrast noted for the ambiguous terpene aroma compound dihydromyrcenol. In a citrus context, woody ratings increase and citrus character decreases. In a woody context, the woody ratings decrease. The group using different

    scales did not rate citrus and woody character during the contextual exposure phase, only overall intensity and pleasantness were rated. From Lawless et al. (1991) by permission of Oxford University Press.

    Context effects can also alter how items are identified and characterized. When people categorize speech sounds, repeated exposure to one type of simple phoneme changes the category boundary for other speech sounds. Repeated exposure to the sound of the phoneme “bah,” which has an early voice onset time, can shift the phoneme boundary so that speech sounds near the boundary are more likely classified as “pah” sounds (a later voice onset) (Eimas and Corbit, 1973). Boundary-level examples are shifted across the boundary and into the next category. This shift resembles a kind of contrast effect.

    an item of poor quality and less appealing if it followed something of better quality. The effect was known to Beebe-Center (1932), who also attributed it to Fechner in 1898. This kind of contrast has been observed for tastes (Riskey et al., 1979; Schifferstein, 1995), odors (Sandusky and Parducci, 1965), and art (Dolese et al., 2005). Another effect observed in these kinds of experiments is that a contrasting item causes other, generally lower rated stimuli to become more similar or less discriminable, an effect termed condensation (Parker et al., 2002; Zellner et al., 2006;). In the study by Zellner et al. (2006), p-exposure to a good-tasting juice reduced the magnitude of pference ratings among less appealing juices. Mediocre items were both worse and more similar. An example of hedonic shifting was found in a study on the optimization of the saltiness of tomato juice and also the sweetness of a fruit beverage using the method of adjustment (Mattes and Lawless, 1985). When trying to optimize the level of sweetness or saltiness in this study, subjects worked in two directions. In an ascending series, they would concentrate a dilute solution by mixing the beverage with a more concentrated

    9.2.4 Hedonic Shifts Changes in the pference or acceptance of foods can be seen as a function of context. Hedonic contrast was a well-known effect to early workers in food acceptance testing (Hanson et al., 1955; Kamenetzky, 1959). An item seems more appealing if it followed

    9.2 Simple Contrast Effects

    version having the same color, aroma, and other flavor materials (i.e., only sweetness or saltiness was different). In a second descending series, they would be given a very intense sample as the starting point and then dilute down to their pferred level. This effect is shown in Fig. 9.5. The adjustment stops too soon and the discrepancy is remarkable, nearly a concentration range of 2:1. The effect was also robust-it could not be attributed to sensory adaptation or lack of discrimination and persisted even when subjects were financially motivated to try to achieve the same endpoints in both trials. This is a case of affective contrast. When compared to a very sweet or salty starting point, a somewhat lower item seems just about right, but when starting with a relatively sour fruit beverage

    Fig. 9.5 Optimized concentrations of salt in tomato juice and sucrose in a fruit beverage. In the trials labeled D, the concentration was diluted from a concentrated version of the test sample. In the trials marked A, the concentration was increased from a dilute version of the test sample. Concentrations of other ingredients were held constant. The contextual shift is consistent with reaching the apparent optimum too soon as if the apparent optimum was shifted in contrast to the starting point. From Mattes and Lawless (1985) with permission.

    209

    or bland tomato juice, just a little bit of sugar or salt helps quite a bit. The stopping point contrasts with the starting material and seems to be better than it would be perceived in isolation. In an ascending or descending sequence of products, a change in responses that happens too soon is called an “error of anticipation.”

    9.2.5 Explanations for Contrast At first glance, one is tempted to seek a physiological explanation for contrast effects, rather than a psychological or a judgmental one. Certainly sensory adaptation to a series of intense stimuli would cause any subsequent test item to be rated much lower. The importance of sensory adaptation in the chemical senses of taste and smell lends some credence to this explanation. However, a number of studies have shown that pcautions against sensory adaptation may be taken, such as sufficient rinsing or time delays between stimuli and yet the context effects persist (Lawless et al., 1991; Mattes and Lawless, 1985; Riskey, 1982). Furthermore, it is difficult to see how sensory adaptation to low-intensity stimuli would cause an increase in the ratings for a stronger item, as adaptation necessarily causes a decrement in physiological responsiveness compared to a no-stimulation baseline. Perhaps the best evidence against a simple adaptation explanation for contrast effects is from the reversed-pair experiments in which the contextual item follows the to-be-rated target item and therefore can have no physiologically adapting effect on it. This paradigm calls for a judgment of the target item from memory after the psentation of the contextual item, in what has been termed a reversed-pair procedure (Diehl et al., 1978). Due to the reversed order, the context effects cannot be blamed on physiological adaptation of receptors, since the contextual item follows rather than pcedes the item to be rated. Reversed-pair effects are seen for shifts in odor quality of aroma compounds like dihydromyrcenol and are only slightly smaller in magnitude than the contextual shift caused when the contextual item comes first (Lawless et al., 1991). The reversed-pair situation is also quite capable of causing simple contrast effects in sensory intensity. A sweetness shift was observed when a higher or a lower sweetness item was interpolated between the tasting and rating (from memory)

    210

    of a normal-strength fruit beverage (Lawless, 1994). Looking back at Fig. 9.2, it seems more likely that the effect changes the response function. However, not all workers in the field agree. In particular, Marks (1994) has argued that the contextual shifts are much like an adaptation process and that for auditory stimuli this is a peripheral event. It is possible that what changes is not the sensation/experience, but to some encoded version of the sensation, or to some kind of implicit response, not yet verbalized. If a person, when rating, is evaluating some memory trace of the experience, it is possible that this memory, for example, could be altered.

    9.3 Range and Frequency Effects Two of the most common factors that can affect ratings are the sensory range of the products to be evaluated and the frequency with which people use the available response options. These factors were nicely integrated into a theory that helped to explain shifts in category ratings. They are also general tendencies that can affect just about any ratings or responses.

    9.3.1 A Little More Theory: Parducci’s Range and Frequency Principles Parducci (1965, 1974) sought to go beyond Helson’s (1964) simple idea that people respond to the mean or the average of their sensory experiences in determining the frame of reference for judgment. Instead, they asserted that the entire distribution of items in a psychophysical experiment would influence the judgments of a particular stimulus. If this distribution was denser (bunched up) at the low ends and a lot of weak items were psented, product ratings would shift up. Parducci (1965, 1974) proposed that behavior in a rating task was a compromise between two principles. The first was the range principle. Subjects use the categories to sub-pide the available scale range and will tend to pide the scale into equal perceptual segments. The second was the frequency principle. Over many judgments, people like to use the categories an equal number of times (Parducci, 1974). Thus it is not only

    9

    Context Effects and Biases in Sensory Judgment

    the average level that is important but also how stimuli may be grouped or spaced along the continuum that would determine how the response scale was used. Category scaling behavior could be pdicted as a compromise between the effects of the range and frequency principles (Parducci and Perrett, 1971).

    9.3.2 Range Effects The range effect has been known for some time, both in category ratings and other judgments including ratio scaling (Engen and Levy, 1958; Teghtsoonian and Teghtsoonian, 1978). When expanding or shrinking the overall range of products, subjects will map their experiences onto the available categories (Poulton, 1989). Thus short ranges produce steep psychophysical functions and wide ranges produce flatter functions. An example of this can be seen in two published experiments on rating scales (Lawless and Malone, 1986a, b). In these studies, four types of response scales and a number of visual, tactile, and olfactory continua were used to compare the abilities of consumers to use the different scales to differentiate products. In the first study, the consumers had no trouble in differentiating the products and so in the second study, the stimuli were spaced more closely on the physical continua so that the task would be more challenging. However, when the experimenters closed the stimuli in, the range principle took over, and participants used more of the rating scale than expected. In Fig. 9.6, ratings for perceived thickness of (stirred) silicone samples are shown in the wide and narrow stimulus ranges. Note the steepening of the response function. For the same one log unit change in physical viscosity, the range of responses actually doubled from the wide range to the narrower range. Another kind of stimulus range effect occurs with anchor stimuli. Sarris (1967) pviously showed a strong effect of anchor stimuli on the use of rating scales, unless the anchors were very extreme, at which point their influence would tend to diminish, as if they had become irrelevant to the judgmental frame of reference. Sarris and Parducci (1978) found similar effects of both single and multiple end anchors that generally take the form of a contrast effect. For example, a low anchor stimulus, whether rated or unrated, will cause stronger stimuli to receive higher ratings than

    9.3 Range and Frequency Effects

    211

    Fig. 9.6 A simple range effect. When products are psented over a wide range, a shallow psychophysical function is found. Over a narrow range, a steeper psychophysical function will be observed. This is in part due to the tendency of subjects to map the products (once known) onto the available scale range. From Lawless and Malone (1986b), with permission.

    they would if no anchor were psented, unless the “anchor” is so extreme as to seem irrelevant. Sarris and Parducci (1978) provided the following analogy: A salesman will consider his commissions to be larger if coworkers receive less than he does. However, he is unlikely to extend or shift his scale of judgment by hearing about others who are in an entirely different bracket of income (1978, p. 39). Whether an outlying product is similar enough to have an influence (or whether it is a “horse of a different color”) should be of concern in product comparisons of perse items.

    9.3.3 Frequency Effects The frequency effect is the tendency of people to try to use the available response options about the same number of times across a series of products or stimuli to be rated. The frequency effect can cause shifts that look like simple contrast and also a local steepening of the psychophysical function around points where stimuli were closely spaced or very numerous (compared with less “dense” portions of the stimulus range). The frequency principle dictates that when judging many samples, products that are numerous or bunched at the low or high ends of the distributions tend to be spad out into neighboring categories. This is illustrated in the two panels of Fig. 9.7. The upper panel shows four hypothetical experiments and how products might be bunched in different parts of the range. In the upper left panel, we see how a normal replicated psychophysical experiment would be conducted with

    equal psentations of each stimulus level. The common outcome of such a study using category ratings would be a simple linear function of the log of stimulus intensity. However, if the stimulus psentations were more frequent at the high end of the distribution, i.e., negative skew, the upper categories would be overused, and subjects would begin to distribute their judgments into lower categories. If the samples were bunched at the lower end, the lower response categories would be overused and subjects would begin to move into higher categories. If the stimuli were bunched in the midrange, the adjacent categories would be used to take on some of the middle stimuli, pushing extreme stimuli into the ends of the response range, as shown in the panel for a quasi-normal distribution. Such behavior is relevant to applied testing situations. For example, in rating the psence of off-flavors or taints, there may be very few examples of items with high values on the scale and lots of weak (or zero) sensations. The frequency effect may explain why the low end of the scale is used less often than anticipated, and higher mean values are obtained than one would deem appropriate. Another example is screening a number of flavor or fragrance candidates for a new product. A large number of good candidates are sent for testing by suppliers or a flavor development group. Presumably these have been p-tested or at least have received a favorable opinion from a flavorist or a perfumer. Why do they then get only mediocre ratings from the test panel? The high end of the distribution is overrepsented (justifiably so and perhaps on purpose), so the tendency for the panel is to drop into lower categories. This may partly explain why in-house testing

    212

    9

    Number of Stimuli Presented

    Rectangular Distribution

    Quasinormal

    Positive Skew

    Negative Skew

    Predicted Rating

    number of zones or categories. The effect of grouping or spacing products also intensifies as the exposure to the distributions increases. Lawless (1983) showed that the shift that occurred with a negative skew (bunching at the upper end) into lower response categories would intensify as the exposure to the skewed distribution went from none to a single exposure to three exposures. Thus the contextual effects do not suddenly appear but will take hold of the subjects’ behavior as they gain experience with the sample set.

    9.4 Biases

    Rectangular Distribution

    Positive Skew

    Quasi-normal Distribution

    Context Effects and Biases in Sensory Judgment

    Negative Skew

    Stimulus Intensity (log scale)

    Fig. 9.7 Predictions from the Parducci range-frequency theory. Distributions of stimuli that are concentrated at one part of the perceptual range (upper quartet) will show local steepening of the psychophysical functions (lower quartet). This is due to subjects’ tendencies to use categories with equal frequency, the resulting shifting into adjacent categories from those that are overused.

    panels are sometimes more critical or negative than consumers when evaluating the same items. Although the great majority of experiments on the range and frequency effects have been performed with simple visual stimuli, there are also examples from taste evaluation (Lee et al., 2001; Riskey et al., 1979; Riskey, 1982; Schifferstein and Frijters, 1992; Vollmecke, 1987). Schifferstein and Frijters found similar effects of skewed distributions with line-marking responses as seen in pvious studies with category ratings. Perhaps line marking is not a response scale with infinite pisions, but panelists sub-pide the line into discrete sub-ranges as if they were using a limited

    9.4.1 Idiosyncratic Scale Usage and Number Bias People appear to have pferred ranges or numbers on the response scale that they feel comfortable using. Giovanni and Pangborn (1983) noted that people using magnitude estimation very often used numbers that were multiples of 2 and 5 (or obviously 10), an effect that is well known in the psychophysical literature (Baird and Noma, 1978). With magnitude estimation, the idiosyncratic usage of a favorite range of numbers causes a correlation of the power function exponents across different sensory continua for an inpidual (Jones and Marcus, 1961; Jones and Woskow, 1966). This correlation can be explained if people are more or less expansive (versus restrictive) in their response output functions, i.e., in how they apply numbers to their sensations in magnitude estimation studies. Another version of such personal idiosyncrasy is the common observation in time-intensity scaling that people show a kind of personal “signature” or a characteristic curve shape (Dijksterhuis, 1993; McGowan and Lee, 2006). Another version of self-induced response restriction can be seen when people use only selected portions of the scale in a line-marking rating task. On a line scale with verbal labels, people may choose to make markings only near the verbal labels, rather than distributing them across the response scale. This was first observed by Eng (1948) with a simple hedonic line scale labeled Like Very Highly at one end, Dislike Very Highly at the other, and Neither Like nor Dislike at the center. In a group of 40 consumers, 24 used only the three labeled parts of the scale, and Eng deleted them

    9.4 Biases

    from the data analysis! This kind of behavior was also noted with the labeled affective magnitude scale (LAM scale) by Cardello et al. (2008) with both Army laboratory and student groups. Lawless et al. (2010a) found a very high frequency (sometimes above 80%) of people making marks within ±2 mm of a phrase mark on the LAM scale in a multi-city consumer central location test. Lawless et al. (2010b) found that instructions did not seem to change this behavior much but that expanding the physical size of the scale on the ballot (from about 120 to 200 mm) decreased the “categorical” behavior somewhat. Categorical rating behavior can also be seen as a step function in time-intensity records (rather than a smooth continuous curve). Finding product differences against the background of these inpidual response tendencies can be facilitated by within-subject experimental designs. Each participant is used as his or her own baseline in comparisons of products, as in dependent t-tests or repeated measure analysis of variance in complete block designs. Another approach is to compute a difference score for a comparison of products in each inpidual’s data, rather than merely averaging across people and looking at differences between mean values.

    9.4.2 Poulton’s Classifications Poulton (1989) published extensively on biases in ratings and classified them. Biases in Poulton’s system go beyond Parducci’s theory but are documented in the psychophysical literature. These include centering biases, contraction biases, logarithmic response bias with numerical ratings, and a general transfer bias that is seen when subjects carry the context from a pvious session or study into a new experiment. The centering bias is especially relevant to just-right scales and is discussed in a later section. The response range bias is also a special case and follows this section. The contraction biases are all forms of assimilation, the opposite of contrast. According to Poulton, people may rate a stimulus relative to a reference or a mean value that they hold in memory for similar types of sensory events. They tend to judge new items as being close (perhaps too close) to this reference value, causing underestimation of high values and overestimation of low values. There may also be

    213

    overestimation of an item when it follows a stronger standard stimulus or underestimation when it follows a weaker standard stimulus, a sort of local contraction effect. Poulton also classifies the tendency to gravitate toward the middle of the response range as a type of contraction effect, called a response contraction bias. While all of these effects undoubtedly occur, the question arises as to whether contrast or assimilation is a more common and potent process in human sensory judgment. While some evidence for response assimilation has been found in psychophysical experiments through sequential analysis of response correlations (Ward, 1979), contrast seems much more to be the rule with taste stimuli (Schifferstein and Frijters, 1992) and foods (Kamenetzky, 1959). In our experience, assimilation effects are not as pvalent as contrast effects, although assimilation has certainly been observed in experiments on consumer expectation (e.g., Cardello and Sawyer, 1992). In that case, the assimilation is not toward other actual stimuli but toward expected levels. The logarithmic response bias can be observed with open-ended response scales that use numbers, such as magnitude estimation. There are several ways to view this type of bias. Suppose that a series of stimuli have been arranged in increasing magnitude and they are spaced in subjectively equal steps. As the intensity increases, subjects change their strategy as they cross into ranges of numerical responses where there are more digits. For example, they might be rating the series using numbers like 2, 4, 6, and 8, but then when they get to 10, they will continue by larger steps, perhaps 20, 40, 60, 80. In Poulton’s view they proceed through the larger numerical responses “too rapidly.” A converse way of looking at this problem is that the perceived magnitude of the higher numbers is in smaller arithmetic steps as numbers get larger. For example, the difference between one and two seems much larger compared to the difference between 91 and 92. Poulton also points out that in addition to contraction of stimulus magnitude at very high levels, the converse is also operating and that people seem to illogically expand their subjective number range when using responses smaller than the number 3. One obvious way to avoid the problems in number bias is to avoid numbers altogether or to substitute line scaling or cross-modality matching to line length as a response instead of numerical rating techniques like magnitude estimation (Poulton, 1989).

    214

    Transfer bias refers to the general tendency to use pvious experimental situations and remembered judgments to calibrate oneself for later tasks. It may involve any of the biases in Poulton’s or Parducci’s theories. The situation is common when subjects are used in multiple experiments or when sensory panelists are used repeatedly in evaluations (Ward, 1987). People have memories and a desire to be internally consistent. Thus the ratings given to a product on one occasion may be influenced by ratings given to similar products on pvious occasions. There are two ways to view this tendency. One is that the judgments may not shift appropriately when the panelists’ sensory experience, perception, or opinion of the product has in fact changed. On the other hand, one of the primary functions of panelist training and calibration in descriptive analysis is to build up exactly those sorts of memory references that may stabilize sensory judgments. So there is a positive light to this tendency as well. An open question for sensory evaluation is whether exposure to one continuum of sensory intensities or one type of product will transfer contextual effects to another sensory attribute or a related set of products (Murphy, 1982; Parducci et al., 1976; Rankin and Marks, 1991). And if so, how far does the transfer extend?

    9.4.3 Response Range Effects One of Poulton’s biases was called the “response range equalizing bias” in which the stimulus range is held constant but the response range changes and so do the ratings. Ratings expand or contract so that the entire range is used (minus any end-category avoidance). This is consistent with the “mapping” idea mentioned for stimulus range effects (stimuli are mapped onto the available response range). Range stabilization is implicit in the way some scaling studies have been set up and in the instructions given to subjects. This is similar to the use of physical reference standards in some descriptive analysis training (Muñoz and Civille, 1998) and is related to Sarris’s work on anchor stimuli (Sarris and Parducci, 1978). In Anderson’s work with 20-point category scales and line marking, high and low examples or end anchors are given to subjects to show them the likely range of the stimuli to be encountered. The range of responses is known since

    9

    Context Effects and Biases in Sensory Judgment

    it is visible upon the page of the response sheet or has been p-familiarized in a practice session (Anderson, 1974). Thus it is not surprising that subjects distribute their responses across the range in a nicely graded fashion, giving the appearance that there is a reasonably linear use of the scale. Anderson noted that there are end effects that work against the use of the entire range (i.e., people tend to avoid using the endpoints) but that these can be avoided by indenting the response marks for the stimulus end anchors, for example, at points 4 and 16 on the 20-point category scale. This will provide psychological insulation against the end effects by providing a comfort zone for unexpected or extreme stimuli at the ends of the scale while leaving sufficient gradations and room to move within the interior points. The “comfort zone” idea is one reason why early workers in descriptive analysis used line scales with indented vertical marks under the anchor phrases. An exception to the response range mapping rule is seen when anchor phrases or words on a scale are noted and taken seriously by participants. An example is in Green’s work on the labeled magnitude scale, which showed a smaller response range when it was anchored to “greatest imaginable sensation” that included all oral sensations including pain, as opposed to a wider range when the greatest imaginable referred only to taste (Green et al., 1996). This also looks like an example of contrast in which the high-end anchor can evoke a kind of stimulus context, at least in the participant’s mind. If the image evoked by the high-end phrase is very extreme, it acts like a kind of stimulus that compsses ratings into a smaller range of the scale. A similar kind of response compssion was seen with the LAM scale when it was anchored to greatest imaginable liking for “sensations of any kind” as opposed to a more delimited frame such as “foods and beverages” (Cardello et al., 2008). A sensory scientist should consider how the high anchor phrase is interpted, especially if he or she wants to avoid any compssion of ratings along the response range. As Muñoz and Civille (1998) pointed out, the use of a descriptive analysis scale also depends a lot on the conceptualization of the high extreme. Does “extremely strong” refer to the strongest possible taste among all sensations and products, the strongest sensation in this product type, or just how strong this particular attribute can become in this particular product? The strongest sweetness in this product might be more intense than the strongest saltiness. The definition needs to be a

    9.4 Biases

    215

    deliberate choice of the panel leader and an explicit instruction to the panelist to give them a uniform frame of reference.

    9.4.4 The Centering Bias The centering bias arises when subjects become aware of the general level of stimulus intensity they are likely to encounter in an experiment and tend to match the center or midpoint of the stimulus range with the midpoint of the response scale. Poulton (1989) distinguished a stimulus centering bias from a response centering bias, but this distinction is primarily a function of how experiments are set up. In both cases, people tend to map the middle of the stimulus range onto the middle of the response range and otherwise ignore the anchoring implications of the verbal labels on the response scale. Note that the centering bias works against the notion that respondents can use unbalanced scales with any effectiveness. For example, the “Excellent-very good-good-fair-poor” scale commonly used in marketing research with consumers is unbalanced. The problem with unbalanced scales is that over many trials, the respondents will come to center their responses on the middle category, regardless of its verbal label. The centering bias is an important problem when there is a need to interpolate some value on a psychophysical function or to find an optimal product in just-right scaling. Poulton gives the example of McBride’s method for considering bias in the justabout-right (JAR) scale (McBride, 1982; see also Johnson and Vickers, 1987). In any series of products to be tested, say for just-right level of sweetness, there is a tendency to center the series so that the middle product will come out closest to the just-right point. The function shifts depending upon the range that is tested. One way to find the true just-right point would be to actually have the experimental series centered on that value, but then of course you would not need to do the experiment. McBride gives a method for interpolation across several experiments with different ranges. The point at which the just-right function and the median of the stimulus series will cross shows the unbiased or true just-right level. This method of interpolation is shown in Fig. 9.8. In this method, you

    Fig. 9.8 Adjusting for the centering bias in just-right ratings. Three series of sucrose concentrations in lemonade were tested, a low series (2-8%), a middle series (6-14%), and a high range (10-22%). In the upper panel, the method of Poulton is used to interpolate the unbiased just-right point from the series where the midpoint concentration would correspond to the just-right item. In the lower panel, the method of McBride is used to interpolate the just-right point from a series in which the average response would correspond to the just-right point. When the average response would be just right (zero on this scale), the hypothetical stimulus range would have been centered on the just-right level. Replotted from Johnson and Vickers (1987), with permission.

    psent several ranges of the products in separate sessions and plot how the judgments of the JAR point shift up and down. You can then interpolate to find the range in which the just-right point would have been from the center product in the series. This obviously takes more work to do the test a couple of times, but it could avoid a mistaken estimate of the JAR level.

    216

    9.5 Response Correlation and Response Restriction

    9

    9.5.1 Response Correlation A simple example of a halo effect is shown in Fig. 9.9. In this case, a small amount of vanilla extract was added to low-fat milk, near the threshold of perceptibility. Ratings were then collected from 19 milk consumers for sweetness, thickness, creaminess and liking for the spiked sample, and for a control milk. In spite of the lack of relationship between vanilla aroma and sweet taste and between vanilla and texture characteristics, the introduction of this one positive aspect was sufficient to cause apparent enhancement in sweetness, creaminess, and thickness ratings. Apparent enhancement of sweetness is an effect long known for ethyl maltol, a caramelization product that has an odor similar to heated sugar (Bingham

    ADDED VANILLA CONTROL MILK

    13

    MEAN RATING + 1 S.E.M.

    11

    Early experimental psychologists like Thorndike (1920) noted that one very positive attribute of a person could influence judgments on other, seemingly unrelated characteristics of that inpidual. In personnel evaluations of military officers, Thorndike noted a moderate positive correlation among the inpidual rated factors. People evaluate others like this in real life. If achievement in sports is influential in our assessment of a person, we might suppose a gifted athlete to also be kind to children, generous to charities, etc., even though there is no logical relationship between these characteristics. People like to have cognitive structures that form consistent wholes and are without conflicts or contradictions (called cognitive dissonance) that can make us uncomfortable. The halo effect has also been described as a carry-over from one positive product to another (Amerine et al., 1965), but its common usage is in reference to a positive correlation of unrelated attributes (Clark and Lawless, 1994). Of course, there can also be negative or horns effects, in which one salient negative attribute causes other, unrelated attributes to be viewed or rated negatively. If a product makes a mess in the microwave, it might be rated negatively for flavor, appearance, and texture as well.

    Context Effects and Biases in Sensory Judgment

    *Different from

    * *

    9

    control, paired t-test N = 19.

    *

    7

    5

    3

    1

    SWEET

    CREAMY THICK SCALED ATTRIBUTE

    LIKE

    Fig. 9.9 Adding a just perceivable level of vanilla extract to low-fat milk causes increases in rated sweetness, thickness, creaminess, and liking, an example of the Halo effect. From Lawless and Clark (1994), with permission.

    et al., 1990). When maltol is added to various products, sweetness ratings may rise compared to products lacking this flavor. However, the effect seems to be a case of the misattribution of olfactory stimulation to the taste sense. Murphy and Cain (1980) showed that citral (a lemon odor) could enhance taste ratings, but only when the nostrils were open, which allows diffusion of the odor into the nose and stimulation of the olfactory receptors (i.e., retronasal smell). When the nostrils are pinched shut, the diffusion is effectively eliminated and the enhancement disappears. Murphy and Cain interpted this as convincing evidence that there was no true enhancement of taste intensity by citral, but only olfactory referral, a kind of confusion between taste and smell. Studies with other odors have also shown that the taste enhancement effect from volatile flavors can be eliminated by nose pinching (Frank and Byram, 1988) even for maltol (Bingham et al., 1990). The maltol effect is also minimized by training subjects who then learn to more effectively separate or localize their odor experiences from taste (Bingham et al., 1990). The sweetness enhancement may arise as a function of conditioning or experience with the pairing of sweet tastes with some odors in foods (Stevenson et al., 1995). Several lessons can be learned from the vanilla halo effect shown in Fig. 9.9. First, untrained consumers

    9.5 Response Correlation and Response Restriction

    9.5.2 “Dumping” Effects: Inflation Due to Response Restriction in Profiling It is part of the folklore of consumer testing that if there is one very negative and salient attribute of a product, it will influence other attributes in a negative direction, an example of a horns effect. The effect is even worse when the salient negative attribute is omitted from the questionnaire. Omission could be due to some oversight or failure to anticipate the outcome in a consumer test or simply that it was not observed in the laboratory conditions of pliminary phases of testing. In this case, consumers will find a way to dump their frustration from not being able to report their dissatisfaction by giving negative ratings on other scales or reporting negative opinions of other even unrelated attributes. In other words, restricting responses or failure to ask a relevant question may change ratings on a number of other scales. A common version of this restriction effect can be seen in sweetness enhancement. Frank et al. (1993) found that the enhancement of sweet ratings in the psence of a fruity odor was stronger when ratings were restricted to sweetness only. When

    217

    both sweetness and fruitiness ratings were allowed, no enhancement of sweetness was observed. Exactly the same effect was seen for sweetness and fruitiness ratings and for sweetness and vanilla ratings (Clark and Lawless, 1994). So allowing the appropriate number of attributes can address the problem of illusory enhancement. Schifferstein (1996) gave the example of hexenol, a fresh green aroma, which when added to a strawberry flavor mixture caused mean ratings in several other scales to increase. The enhancement of the other ratings occurred only when the “green” attribute was omitted from the ballot. When the “green” attribute was included in the ballot, the response was correctly assigned to that scale, and there was no apparent enhancement in the other attributes in the aroma profile. There is good news and bad news in these observations. From a marketing perspective, ratings can be easily obtained from consumers that will show apparent sweetness enhancements if the questionnaires cleverly omit the opportunity to report on sensations other than sweetness. However, the nose pinch conditions and the use of complete sets of attributes show us that these volatile odorants such as maltol are not sweet taste enhancers but they are sweet rating enhancers. That is, they are not affecting the actual perception of sweet taste intensity but are changing the response output function or perhaps broadening the concept of sweetness to go beyond taste and include pleasant aromas as well. It would not be wise to try to use maltol to sweeten your coffee. Are there other situations in sensory testing where the dumping effect can show up? One area in which responses are usually restricted to one attribute at a time is in time-intensity scaling (Chapter 8). In a common version of this technique, the subject moves a pointer, a mouse, or some other response device to provide a continuous record of sensory intensity for a specified attribute. Usually, just one attribute at a time is rated since it is very difficult to attend continuously or even by rapid shifting of attention to more than one attribute. This would seem to be a perfect opportunity for the dumping tendency to produce illusory enhancements (e.g., Bonnans and Noble, 1993). This idea was tested in experiments with repeated category ratings across time, a time-intensity procedure that allows for ratings of multiple attributes. These studies showed sweetness enhancement in sweet-fruity mixtures when sweetness alone was rated, but little or

    218

    no enhancement when both sweetness and fruit intensity were rated over time (Clark and Lawless, 1994). This is exactly parallel to the sweetness enhancement results seen by Frank and colleagues. Workers using single-attribute time-intensity ratings should be wary of apparent enhancements due to response restriction.

    9.5.3 Over-Partitioning In the data of van der Klaauw and Frank (1996), one can also see cases in which having too many attributes causes a deflation in ratings. As in the dumping examples, their usual paradigm was to compare the sweetness ratings of a simple sucrose solution to the same concentration with a fruity odor added. When rating sweetness only, the rating is higher than when rating sweetness and fruitiness, the common dumping effect. But when total intensity and six additional attributes were rated, the sweetness rating was significantly lower than either of the other two conditions. In another example, including a bitterness rating (in addition to the sweetness and fruitiness ratings) lowered the sweetness rating compared to rating sweetness (the highest condition) and also compared to rating sweetness and fruitiness (an intermediate sweetness rating was obtained). This effect appears to be a deflation due to people over-partitioning their sensations into too many categories. The specific choices may be important in this effect. Adding only a bitter or a bitter and a floral rating had little or no effect and dumping inflation was still observed probably because there was no fruity rating. In the study by Clark and Lawless (1994), the control condition (sweetener only) showed some evidence of a decrement when the attributes for volatiles were also available. Even more dramatic was the complete reversal of sweet enhancement to inhibition when a large number of response categories were provided for simple mixtures (Frank et al., 1993). Although this effect has not been thoroughly studied, it serves to warn the sensory scientist that the number of choices given to untrained consumers may affect the outcome and that too many choices may be as dangerous as too few. Whether this effect might be seen with trained panels remains an open question. It is sometimes difficult to pdetermine the correct number of attributes to rate in order to guard against

    9

    Context Effects and Biases in Sensory Judgment

    the dumping effect. Careful p-testing and discussion phases in descriptive training may help. It is obviously important to be inclusive and exhaustive, but also not to waste the panelists’ time with irrelevant attributes.

    9.6 Classical Psychological Errors and Other Biases A number of psychological errors in judgment have been described in the literature and are commonly listed in references in sensory evaluation (e.g., Amerine et al., 1965; Meilgaard et al., 2006). They are only briefly listed here, as they serve primarily as empirical descriptions of behavior, without much reference to cause or any theoretical bases. It is important to distinguish between description and explanation, and not to confuse naming something with trying to explain why it occurred in terms of mechanism or larger theory. The sensory evaluation practitioner needs to be aware of these errors and the conditions under which they may occur.

    9.6.1 Errors in Structured Sequences: Anticipation and Habituation Two errors may be seen when a non-random sequence of products is psented for evaluation, and the observer is aware that a sequence or a particular order of items is going to be psented. The error of anticipation is said to occur when the subject shifts responses in the sequence before the sensory information would indicate that it is appropriate to do so (Mattes and Lawless, 1985). An example is found in the method of limits for thresholds, where an ascending sequence is psented and the observer expects a sensation to occur at some point and “jumps the gun.” The opposite effect is said to be the error of habituation, in which the panelist stays put too long with one pvious response, when the sensory information would indicate that a change is overdue. Obviously, the psentation of samples in random order will help to undo the expectations involved in causing the error of anticipation. Perseveration is a little bit harder to understand but may have to do with lack of attention or motivation

    9.7 Antidotes

    on the part of the observer or having an unusually strict criterion for changing responses. Attention and motivation can be addressed by sufficient incentives and keeping the test session from being too long.

    219

    The stimulus error is another classical problem in sensory measurement. This occurs when the observer knows or psumes to know the identity of the stimulus and thus draws some inference about what it should taste, smell, or look like. The judgment is biased due to expectations about stimulus identity. In the old parlor game of trying to identify the origin and vintage of a wine, stimulus error is actually a big help. It is much easier to guess the wine if you know what the host is prone to drink or you have taken a peek at the bottles in the kitchen beforehand. In sensory evaluation, the principle of blind testing and the practice using random three-digit coding of samples mitigate against the stimulus error. However, panelists are not always completely in the dark about the origin or the identity of samples. Employee panels may have a fair amount of knowledge about what is being tested and they may make inferences, correctly or incorrectly. For example, in small-plant quality assurance, workers may be aware of what types of products are being manufactured that day and these same workers may serve as sensory panelists. In the worst possible scenario, the persons drawing the samples from production are actually doing the tasting. As much as possible, these situations should be avoided. In quality control panels, insertion of blind control samples (both positive controls and flawed samples) will tend to minimize the guesswork by panelists.

    the sequential effects may be counterbalanced or averaged out in the group data. The second approach is to consider the order effects of interest. In this case, different orders are analyzed as a purpose of the experiment and, if order effects are observed, they are duly noted and discussed. Whether order effects are of interest will depend upon the circumstances of the product evaluation and its goals. If counterbalanced orders or randomization cannot be worked into the experimental design, the experimenter must consider whether product differences are true sensory differences or are artifacts of stimulus order. Purposeful experimentation and analysis of order effects in at least some of the routine evaluations may give one an appciation for where and when these effects occur in the product category of interest. Another well-known order effect in acceptance testing is the reception of a higher score for the first sample in a series (Kofes et al., 2009). Counterbalancing orders is of course appropriate, but one can also give a “dummy” product first to absorb the first product’s s

    --- Bài cũ hơn ---

  • Khái Niệm, Phân Loại Và Chức Năng Của Văn Bản Quản Lý
  • Phòng Giáo Dục Trung Học
  • Văn Bản Quy Phạm Pháp Luật Từ Cao Xuống Thấp Của Việt Nam
  • Chuyển Pdf Sang Word Miễn Phí, Cách Đổi Pdf Thành Doc Không Lỗi Font 1
  • Văn Bản Pháp Luật Tỉnh Quảng Nam
  • Tuyển Tập Phim Cổ Thiên Lạc

    --- Bài mới hơn ---

  • Những Câu Stt Giang Hồ Hay Và Chất, Đọc Là Thấm
  • Phim Người Trong Giang Hồ Phần 3 Thuyết Minh
  • ‘đại Gia’ Bị Truy Nã Nguyễn Xuân Đường Từng Đóng Loạt Phim ‘giang Hồ Mạng’ Triệu Lượt Xem
  • Thần Thám Trần Hạo Nam (Người Trong Giang Hồ 5)
  • Cuộc Sống Hiện Tại Của Diễn Viên Vào Vai Đại Ca Xã Hội Đen Trần Hạo Nam Trong Phim “người Trong Giang Hồ”
  • THÔNG TIN DIỄN VIÊN

    Nickname: Louis Koo

    Chiều cao: 1m81

    Quê quán: Hồng Kông

    Ngày Sinh: 21-10-1970

      Dealer Healer

      Độc Giới

      Năm sản xuất: 2021

      Chen Hua, lãnh đạo của một băng đảng khét tiếng, đã mất gia đình và người yêu của mình. Anh ta cũng bị kết án tù vì lạm dụng ma túy và buôn người. Khi mãn hạn tù, anh ra tù và giúp đỡ những con…

      Cám ơn sự chia sẻ của: Lan Chi

      Khai Tâm Ma Pháp

      A Magic To Win

      Khai Tâm Ma Pháp

      Đánh giá: 5

      Năm sản xuất: 2011

      Phim Khai Tâm Ma Pháp xoay quanh những người bề ngoài trông như những người rất bình thường tại thành phố, nhưng thực chất mang trong người siêu năng lực. Phim hay này xoay quanh những nhân vật có khả…

      Cám ơn sự chia sẻ của: Liên Hoa

      Bát Tinh Báo Hỷ

      All’s Well, Ends Well

      Bát Tinh Báo Hỷ

      Đánh giá: 5.7

      Năm sản xuất: 2012

      Phim Bát Tinh Báo Hỷ – All’s Well, Ends Well 2012 tiếp nối loạt phim All’s Well, Ends Well các năm trước, vẫn giữ phong cách hài hước tình cảm: có tình thân, có tình yêu, có tình bạn, … kể chuyện các thành…

      Cám ơn sự chia sẻ của: Hiền Mai

      Kế Hoạch Baby

      Robin-B-Hood

      Kế Hoạch Baby

      Đánh giá: 6.7

      Năm sản xuất: 2006

      Trong phim Kế Hoạch Baby, đối với Fong con nghiện cờ bạc không bao giờ-do-cũng có chỉ có một điều đáng sợ hơn hơn so với các con nợ tại trước cửa nhà của ông phải dỗ một em bé khóc. Nhưng điều…

      Cám ơn sự chia sẻ của: Ðức Giang

      Môn Đồ

      Protege

      Môn Đồ

      Đánh giá: 7.3

      Năm sản xuất: 2007

      Trong phim Môn Đồ, điệp viên đặc biệt Nick được giao nhiệm vụ mật thám tổ chức đầu não việc buôn bán thuốc phiện của châu Á với vai trò là thuộc hạ của một trong số những tên then chốt trong…

      Cám ơn sự chia sẻ của: Chiêu Dương

      Ngòi Nổ

      Flash Point

      Ngòi Nổ

      Đánh giá: 6.8

      Năm sản xuất: 2007

      Bộ phim Ngòi Nổ kể về Mã Quân, thành viên của đội trọng án Hong Kong. Trong phim online này, Mã Quân đã bao lần vào sinh ra tử để hoàn thành nhiệm vụ. Bởi tính tình cương trực đến nóng nảy, không…

      Cám ơn sự chia sẻ của: Minh Hạnh

      Đơn Thân Nam Nữ

      Don’t Go Breaking My Heart

      Đơn Thân Nam Nữ

      Đánh giá: 6.5

      Năm sản xuất: 2011

      Phim Đơn Thân Nam Nữ là một bộ phim điện ảnh về một câu chuyện tình yêu của ba người trong cùng một công ty. Một người là giám đốc công ty, anh ta là một giám đốc ăn chơi và sát gái. Xem phim online…

      Cám ơn sự chia sẻ của: Quốc Anh

      Độc Chiến

      Drug War

      Độc Chiến

      Đánh giá: 7

      Năm sản xuất: 2013

      Bộ phim Độc Chiến nói về một vụ tai nạn lạ thường, từ đó phơi bày một vụ trọng án về ma túy. Xem phim online này bạn sẽ thấy chuyện bắt đầu khi một tài xế lạc tay lái, đụng vào một tiệm…

      Cám ơn sự chia sẻ của: Thiên Khánh

      Tình Trên Non Cao

      Romancing In Thin Air

      Tình Trên Non Cao

      Đánh giá: 6

      Năm sản xuất: 2012

      Phim Cao Hải Bạt Chi Luyến 2 là bộ phim tình cảm tâm lý do hai Cổ Thiên Lạc và Cao Viên Viên vào vai hai diễn viên nổi tiếng. Trong phim hay này, hai người là cặp đôi đẹp trên màn ảnh đồng thời cũng…

      Cám ơn sự chia sẻ của: Ngọc Oanh

      Trăm Năm Hạnh Phúc

      Love For All Seasons

      Trăm Năm Hạnh Phúc

      Đánh giá: 5.4

      Năm sản xuất: 2003

      Trong phim Trăm Năm Hạnh Phúc, Tiger, một thương gia giàu có và nổi tiếng ở Hồng Kông bị mắc một chứng bệnh khó chữa phải đến Trung Quốc nhờ một vị thần y chữa bệnh. Tại đây anh gặp May – hậu…

      Cám ơn sự chia sẻ của: Diễm Khuê

      Nhu Đạo Long Hổ Bang

      Throw Down

      Nhu Đạo Long Hổ Bang

      Đánh giá: 6.9

      Năm sản xuất: 2004

      Chuyện phim Nhu Đạo Long Hổ Bang xoay quanh Tư Đồ Bảo, một võ sĩ Nhu đạo lừng danh, nhưng khi đang ở trên đỉnh cao sự nghiệp, anh đột nhiên bỏ nghề, chuyển qua làm ông chủ bê tha của một quán rượu….

      Cám ơn sự chia sẻ của: Thục Anh

      Dã Chiến Giang Hồ

      Born Wild

      Dã Chiến Giang Hồ

      Đánh giá: 6.4

      Năm sản xuất: 2001

      Trong phim Dã Chiến Giang Hồ, Trương Hồ và Tấn Hồ là hai anh em song sinh nhưng tính tình lại hoàn toàn khác biệt. Xem phim online này bạn sẽ thấy Tấn thì năng động, thân mật trong khi Trương lại có môt…

      Cám ơn sự chia sẻ của: Giáng Ngọc

      Xã Hội Đen 2

      Election 2

      Xã Hội Đen 2

      Đánh giá: 7.4

      Năm sản xuất: 2006

      Trong phim Xã Hội Đen 2, Jimmy, một tay gangster có ăn học, mang vẻ tri thức lịch sự của một doanh nhân, là kẻ chỉ ham kiếm tiền và làm giàu. Hắn tìm cách liên lạc với các quan chức, tạo quan hệ làm…

      Cám ơn sự chia sẻ của: Bá Kỳ

      Đào Xuất Sinh Thiên

      Out of Inferno

      Đào Xuất Sinh Thiên

      Đánh giá: 5.9

      Năm sản xuất: 2013

      Trong phim Đào Xuất Sinh Thiên, vào ngày nóng nhất trong suốt 50 năm qua, một đám cháy khủng khiếp bùng lên tại toàn tháp thương mại. Trong phim này, một đội cứu hỏa bất khuất đảm nhiệm việc dập tắt…

      Cám ơn sự chia sẻ của: Nguyên Hồng

      Con Bão Trắng

      The White Storm

      Con Bão Trắng

      Đánh giá: 6.7

      Năm sản xuất: 2013

      Phim Con Bão Trắng kể về một cảnh sát chìm (Cổ Thiên lạc) gia nhập vào xã hội đen để triệt phá đường dây ma túy do Hắc Sài đừng đầu. Trong phim này, anh phải kết hợp chặt chẽ với điều tra…

      Cám ơn sự chia sẻ của: Chí Khiêm

      Xin Đừng Gác Máy

      Connected

      Xin Đừng Gác Máy

      Đánh giá: 6

      Năm sản xuất: 2008

      Phim Xin Đừng Gác Máy là bộ phim Hồng Kông làm lại từ tác phẩm năm 2004 Cellular của Mỹ. Mặc dù vậy, với sự tham gia diễn xuất của người đẹp Từ Hy Viên, ngôi sao Cổ Thiên Lạc, tác phẩm này đã…

      Cám ơn sự chia sẻ của: Thanh Lam

      Thiết Thính Phong Vân 2

      Overheard 2

      Thiết Thính Phong Vân 2

      Đánh giá: 6.8

      Năm sản xuất: 2011

      Phim Thiết Thính Phong Vân 2 tiếp nối thành công của phần 1 nhưng khi xem phim online này bạn sẽ thấy phần hai trọng tâm nói về đề tài chứng khoán xoay quanh ba người đàn ông, những mối quan hệ của họ…

      Cám ơn sự chia sẻ của: Thiên Lan

      Thiết Thính Phong Vân

      Overheard

      Thiết Thính Phong Vân

      Đánh giá: 7.2

      Năm sản xuất: 2009

      Phim Thiết Thính Phong Vân là một bộ phim hành động tâm lý kể về một nhóm cảnh sát tình báo hình sự (C.I.B) cải trang thành các paparazi để đi dò la tin tức. Một lần trong phim này, trong quá trình tác…

      Cám ơn sự chia sẻ của: Hồng Liên

      Thương Vương Chi Vương

      Triple Tap

      Thương Vương Chi Vương

      Đánh giá: 6.4

      Năm sản xuất: 2010

      Trong phim Thương Vương Chi Vương, quản lý Ken (Louis Cổ Thiên Lạc) đã đánh bại cảnh sát Jerry Chang (Daniel Ngô Ngạn Tổ) trong một trận đấu súng. Sau đó hai người trở thành bạn bè. Ken tình cờ gặp phải…

      Cám ơn sự chia sẻ của: Vi Quyên

      Xin Chào Mấy Cưng

      Hello Babies

      Xin Chào Mấy Cưng

      Đánh giá: 5.5

      Năm sản xuất: 2014

      Trong phim Xin Chào Mấy Cưng, với sự góp mặt của dàn diễn viên hài gạo cội Hello Babies kể về những người cao tuổi Trung Quốc muốn thế hệ trẻ có con càng sớm càng tốt. Phim online này cũng nói về…

      Cám ơn sự chia sẻ của: Hoàng Khôi

      Thiết Thính Phong Vân 3

      Overheard 3

      Thiết Thính Phong Vân 3

      Đánh giá: 6.1

      Năm sản xuất: 2014

      Nội dung phim Thiết Thính Phong Vân 3 lần này sẽ lấy bối cảnh Trung Quốc những năm 80 của thế kỷ trước và đi sâu vào nội tâm của con người và tính cách của từng nhân vật qua lời thoại cũng như…

      Cám ơn sự chia sẻ của: Việt Khang

      Hương Cảng Tử

      Aberdeen

      Hương Cảng Tử

      Đánh giá: 6

      Năm sản xuất: 2014

      Nội dung bộ phim Hương Cảng Tử xoay quanh câu chuyện của một gia đình đôi vợ chồng trẻ lấy nhau hơn 6 năm và có một đứa con gái xinh xắn, tưởng chừng như hạnh phúc của họ sẽ tồn tại mãi mãi…

      Cám ơn sự chia sẻ của: Thúy Minh

    --- Bài cũ hơn ---

  • Những Lần Lưu Đức Hoa Và Cổ Thiên Lạc ‘chạm Mặt’ Trên Màn Ảnh
  • Đàn Anh Chơi Không Đẹp Nên Cả Nhóm Ra Đầu Thú
  • Giang Hồ Nam Định Chơi Súng Ống Làm Rúng Động Sài Gòn
  • Tiết Lộ “luật Chơi Gái, Săn Chân Dài” Của Đại Ca Giang Hồ
  • Quảng Ninh: Tin Vụ Giang Hồ Quảng Ninh Giải Quyết Mâu Thuẫn Bằng Súng Tại Nhà Riêng
  • Mua Mảnh Đất Rẻ Như Cho Không Ngờ Đang Ngồi Trên ‘mỏ Vàng’ Trong Lòng Đất

    --- Bài mới hơn ---

  • Chi Tiết Về Các Bước Bán Nhà Ở Toronto Canada
  • Cuộc Sống Người Việt Ở Toronto Như Thế Nào?
  • Seattle, Washington : Nơi Sống Tốt Nhất Tại Mỹ
  • Chân Dung Thẩm Phán Dám Gây Sốc Cho Tổng Thống Mỹ Donald Trump
  • Lộ ‘bàn Tay Đạo Diễn Kịch Bản Truyền Thông Bẩn’ Về Nữ Luật Sư Phạm Thị Hồng Thêu Và Vụ Án Đồng Tâm
  • Hơn 100 năm trước, một luật sư ở Toronto (Canada) đã trả 1,5 đô la/mẫu (khoảng 35.000 VND) để mua mảnh đất tại vùng Texas (Mỹ) cằn cỗi. Mảnh đất này hiện được trả hơn 10 tỷ đô la và gia đình ông vẫn không hề muốn bán.

    Ở khu vực Permian (Texas, Mỹ), hoạt động khai thác dầu mỏ sôi động đến mức ánh sáng không lúc nào tắt ở khu vực này. Các miệng giếng khoan, giàn khoan và máy rơ-mooc hoạt động liên tục. Tuy nhiên ở đây vẫn có một nơi tăm tối chưa bị khai thác.

    Đó là một khu vực rộng lớn, với diện tích hơn 250 dặm vuông (gần 65.000 ha) có chứa trữ lượng dầu khí ước tính trị giá 7 tỷ đô la, thuộc sở hữu của gia tộc Fasken. Trong một thời đại mà mọi thứ trong khu vực này dường như đều có thể được rao bán, thì trang trại nhà Fasken vẫn im lìm mà không chấp nhận lời mời chào nào dù được đưa ra mức giá 10 tỷ đô la. Đơn giản là bởi những người chủ sở hữu không quan tâm đến chuyện bán đi đất đai của mình để lấy tiền.

    Trở lại năm 1913, luật sư David Fasken trả 1,5 đô la/mẫu để mua một vùng đồng bằng bên ngoài khu vực Midland, với kế hoạch tạo ra một trang trại nuôi gia súc. Tuy nhiên có quá ít nước để ông thực hiện dự định đó. Và 16 năm sau, khi Fasken qua đời, không ai trong gia đình ông biết ở dưới lòng đất có nguồn tài nguyên dồi dào đến mức nào.

    Bể dầu tại Permian chủ yếu là dầu đá phiến, nằm ở Tây Texas và Đông Nam bang New Mexico và có thể sản xuất ra nhiều dầu thô hơn mọi thành viên trong Tổ chức xuất khẩu dầu mỏ OPEC trừ Ả Rập Saudi và Iraq.

    Các ông lớn dầu mỏ đang lên kế hoạch tăng sản lượng ở bể dầu Permian bằng cách thâu tóm đất đai. Hai năm trước, tập đoàn Exxon Mobil Corp đã mua lại mảnh đất của gia đình Bass ở Dallas với giá 6 tỷ đô la. Thương vụ tiếp theo có thể là công ty Endeavour Energy Resources LP, thuộc sở hữu của Autry Stephen, hiện đang đàm phán để bán cho Royal Dutch Shell Plc với giá hơn 10 tỷ USD.

    Trái ngược với các thương vụ tấp nập xung quanh, Fasken Oil & Ranch vẫn đang đi theo truyền thống của miền Tây Texas với châm ngôn “không bao giờ bán đi các mỏ dầu của mình”. Sau khi David Fasken mất, mảnh đất thuộc quyền sở hữu của Barbara Fasken – cháu dâu của David Fasken. Và hiện tại, tài sản này thuộc về con trai của David Fasken – Robert Dickson (sở hữu 50% tài sản) và 2 người cháu trai khác mỗi người giữ 25%.

    “Gia đình không bao giờ có bất kỳ suy nghĩ nào về việc bán mảnh đất”, Jimmy Davis, người điều hành các hoạt động của Fasken nói.

    Theo Huy Nguyễn/Dân Việt

    Theo Kiến Thức

    --- Bài cũ hơn ---

  • Bảng Thành Tích Của “siêu Luật Sư” Bào Chữa Cho Minh Béo
  • Luật Sư Tư Vấn Luật Trực Tuyến Miễn Phí Trên Toàn Quốc
  • Một Thuyền Viên Vn Bị Kết Án Treo Cổ Tại Malaysia
  • Một Thuyền Viên Việt Nam Bị Malaysia Kết Án Tử Hình
  • Malaysia Kết Án Tử Hình Một Thuyền Viên Vn
  • Tin tức online tv