An examination ( exam or evaluation) or test is an educational assessment intended to measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics (e.g., ). A test may be administered verbally, on paper, on a computer, or in a predetermined area that requires a test taker to demonstrate or perform a set of skills.
Tests vary in style, rigor and requirements. There is no general consensus or invariable standard for test formats and difficulty. Often, the format and difficulty of the test is dependent upon the educational philosophy of the instructor, subject matter, class size, policy of the educational institution, and requirements of accreditation or governing bodies.
A test may be administered formally or informally. An example of an informal test is a reading test administered by a parent to a child. A formal test might be a final examination administered by a teacher in a classroom or an IQ test administered by a psychologist in a clinic. Formal testing often results in a grade or a test score.Thissen, D. & Wainer, H. (2001). Test Scoring. Mahwah, NJ: Erlbaum, p. 1. A test score may be interpreted with regard to a norm or criterion, or occasionally both. The norm may be established independently, or by statistics analysis of a large number of participants.
A test may be developed and administered by an instructor, a clinician, a governing body, or a test provider. In some instances, the developer of the test may not be directly responsible for its administration. For example, in the United States, Educational Testing Service (ETS), a nonprofit educational testing and assessment organization, develops standardized tests such as the SAT but may not directly be involved in the administration or proctoring of these tests.
The bureaucratic imperial examinations as a concept has its origins in the year 605 during the short lived Sui dynasty. Its successor, the Tang dynasty, implemented imperial examinations on a relatively small scale until the examination system was extensively expanded during the reign of Wu Zetian.Paludin, 97. Included in the expanded examination system was a military exam that tested physical ability, but the military exam never had a significant impact on the Chinese officer corps and military degrees were seen as inferior to their civil counterpart. The exact nature of Wu's influence on the examination system is still a matter of scholarly debate.
During the Song dynasty the emperors expanded both examinations and the government school system, in part to counter the influence of hereditary nobility, increasing the number of degree holders to more than five times that of the Tang. From the Song dynasty onward, the examinations played the primary role in selecting scholar-officials, who formed the literati elite of society. However the examinations co-existed with other forms of recruitment such as direct appointments for the ruling family, nominations, quotas, clerical promotions, sale of official titles, and special procedures for eunuchs. The regular higher level degree examination cycle was decreed in 1067 to be 3 years but this triennial cycle only existed in nominal terms. In practice both before and after this, the examinations were irregularly implemented for significant periods of time: thus, the calculated statistical averages for the number of degrees conferred annually should be understood in this context. The jinshi exams were not a yearly event and should not be considered so; the annual average figures are a necessary artifact of quantitative analysis.Kracke, 252. The operations of the examination system were part of the imperial record keeping system, and the date of receiving the jinshi degree is often a key biographical datum: sometimes the date of achieving jinshi is the only firm date known for even some of the most historically prominent persons in Chinese history.
A brief interruption to the examinations occurred at the beginning of the Mongol Yuan dynasty in the 13th century, but was later brought back with regional quotas which favored the Mongols and disadvantaged Southern Chinese. During the Ming dynasty and Qing dynasty dynasties, the system contributed to the narrow and focused nature of intellectual life and enhanced the autocratic power of the emperor. The system continued with some modifications until its abolition in 1905 during the last years of the Qing dynasty. The modern examination system for selecting civil servants also indirectly evolved from the imperial one.Ebrey, Patricia Buckley (2010). The Cambridge Illustrated History of China. Cambridge: Cambridge University Press, 2nd Ed., pp. 145–147, 198–200.
The earliest evidence of examinations in Europe date to 1215 or 1219 in Bologna. These were chiefly oral in the form of a question or answer, disputation, determination, defense, or public lecture. The candidate gave a public lecture of two prepared passages assigned to him from the civil or canon law, and then doctors asked him questions, or expressed objections to answers. Evidence of written examinations do not appear until 1702 at Trinity College, Cambridge. According to Sir Michael Sadler, Europe may have had written examinations since 1518 but he admits the "evidence is not very clear." In Prussia, medication examinations began in 1725. The Mathematical Tripos, founded in 1747, is commonly believed to be the first honor examination, but James Bass Mullinger considered "the candidates not having really undergone any examination whatsoever" because the qualification for a degree was merely four years of residence. France adopted the examination system in 1791 as a result of the French Revolution but it collapsed after only ten years. Germany implemented the examination system around 1800.
Englishmen in the 18th century such as Eustace Budgell recommended imitating the Chinese examination system but the first English person to recommend competitive examinations to qualify for employment was Adam Smith in 1776. In 1838, the Congregational church missionary Walter Henry Medhurst considered the Chinese exams to be "worthy of imitating." In 1806, the British established a Civil Service College near London for training of the East India Company's administrators in India. This was based on the recommendations of British East India Company officials serving in China and had seen the Imperial examinations. In 1829, the company introduced civil service examinations in India on a limited basis. This established the principle of qualification process for civil servants in England.Bodde, Derk, Chinese Ideas in the West. Committee on Asiatic Studies in American Education [1] In 1847 and 1856, Thomas Taylor Meadows strongly recommended the adoption of the Chinese principle of competitive examinations in Great Britain in his Desultory Notes on the Government and People of China. According to Meadows, "the long duration of the Chinese empire is solely and altogether owing to the good government which consists in the advancement of men of talent and merit only." Both Thomas Babington Macaulay, who was instrumental in passing the Saint Helena Act 1833, and Stafford Northcote, 1st Earl of Iddesleigh, who prepared the Northcote–Trevelyan Report that catalyzed the British civil service, were familiar with Chinese history and institutions. The Northcote–Trevelyan Report of 1854 made four principal recommendations: that recruitment should be on the basis of merit determined through standardized written examination, that candidates should have a solid general education to enable inter-departmental transfers, that recruits should be graded into a hierarchy, and that promotion should be through achievement, rather than 'preferment, patronage, or purchase'.Kazin, Edwards, and Rothman (2010), 142.
When the report was brought up in parliament in 1853, Lord Monteagle argued against the implementation of open examinations because it was a Chinese system and China was not an "enlightened country." Lord Stanley called the examinations the "Chinese Principle." The Earl of Granville did not deny this but argued in favor of the examination system, considering that the minority Manchus had been able to rule China with it for over 200 years. In 1854, Edwin Chadwick reported that some noblemen did not agree with the measures introduced because they were Chinese. The examination system was finally implemented in the British Indian Civil Service in 1855, prior to which admission into the civil service was purely a matter of patronage, and in England in 1870. Even as late as ten years after the competitive examination plan was passed, people still attacked it as an "adopted Chinese culture." Alexander Baillie-Cochrane, 1st Baron Lamington insisted that the English "did not know that it was necessary for them to take lessons from the Celestial Empire." In 1875, Archibald Sayce voiced concern over the prevalence of competitive examinations, which he described as "the invasion of this new Chinese culture."Ssu-yu Teng, "Chinese Influence on the Western Examination System", Harvard Journal of Asiatic Studies 7 (1942–1943): 267–312.
After Great Britain's successful implementation of systematic, open, and competitive examinations in India in the 19th century, similar systems were instituted in the United Kingdom itself, and in other Western nations.Wu, 417 Like the British, the development of the French and American civil service was influenced by the Chinese system. When Thomas Jenckes made a Report from the Joint Select Committee on Retrenchment in 1868, it contained a chapter on the civil service in China. In 1870, William Spear wrote a book called The Oldest and the Newest Empire-China and the United States, in which he urged the United States government to adopt the Chinese examination system. Like in Britain, many of the American elites scorned the plan to implement competitive examinations, which they considered foreign, Chinese, and "un-American." As a result, the civil services reform introduced into the House of Representatives in 1868 was not passed until 1883. The Civil Service Commission tried to combat such sentiments in its report:
Both World War I and World War II demonstrated the necessity of standardized testing and the benefits associated with these tests. Tests were used to determine the mental aptitude of recruits to the military. The US Army used the Stanford–Binet Intelligence Scale to test the IQ of the soldiers.Kaplan, R. M. & Saccuzzo, D. P. (2009) Psychological Testing Belmont, CA: Wadsworth. After the War, industry began using tests to evaluate applicants for various jobs based on performance. In 1952, the first Advanced Placement (AP) test was administered to begin closing the gap between high schools and colleges.
Some countries such as the United Kingdom and France require all their secondary school students to take a standardized test on individual subjects such as the General Certificate of Secondary Education (GCSE) (in England) and Baccalauréat respectively as a requirement for graduation. These tests are used primarily to assess a student's proficiency in specific subjects such as mathematics, science, or literature. In contrast, high school students in other countries such as the United States may not be required to take a standardized test to graduate. Moreover, students in these countries usually take standardized tests only to apply for a position in a university program and are typically given the option of taking different standardized tests such as the ACT or SAT, which are used primarily to measure a student's reasoning skill. Name changed in 1996. High school students in the United States may also take Advanced Placement tests on specific subjects to fulfill university-level credit. Depending on the policies of the test maker or country, administration of standardized tests may be done in a large hall, classroom, or testing center. A Exam invigilator may also be present during the testing period to provide instructions, to answer questions, or to prevent cheating.
Grades or test scores from standardized test may also be used by universities to determine whether a student applicant should be admitted into one of its academic or professional programs. For example, universities in the United Kingdom admit applicants into their undergraduate programs based primarily or solely on an applicant's grades on pre-university qualifications such as the GCE A-levels or Cambridge Pre-U. In contrast, universities in the United States use an applicant's test score on the SAT or ACT as just one of their many admission criteria to determine whether an applicant should be admitted into one of its undergraduate programs. The other criteria in this case may include the applicant's grades from high school, extracurricular activities, personal statement, and letters of recommendations. Once admitted, undergraduate students in the United Kingdom or United States may be required by their respective programs to take a comprehensive examination as a requirement for passing their courses or for graduating from their respective programs.
Standardized tests are sometimes used by certain countries to manage the quality of their educational institutions. For example, the No Child Left Behind Act in the United States requires individual states to develop assessments for students in certain grades. In practice, these assessments typically appear in the form of standardized tests. Test scores of students in specific grades of an educational institution are then used to determine the status of that educational institution, i.e., whether it should be allowed to continue to operate in the same way or to receive funding.
Finally, standardized tests are sometimes used to compare proficiencies of students from different institutions or countries. For example, the Organisation for Economic Co-operation and Development (OECD) uses Programme for International Student Assessment (PISA) to evaluate certain skills and knowledge of students from different participating countries.
A single test can have multiple qualities. For example, the Bar examination for aspiring lawyers may be a norm-referenced, standardized, summative assessment. This means that only the test takers with higher scores will pass, that all of them took the same test under the same circumstances and were graded with the same scoring standards, and that the test is meant to determine whether the law school graduates have learned enough to practice their profession.
In some tests; where knowledge of many constants or technical terms is required to effectively answer questions, like Chemistry or Biology – the test developer may allow every test taker to bring with them a cheat sheet.
A test developer's choice of which style or format to use when developing a written test is usually arbitrary given that there is no single invariant standard for testing. Be that as it may, certain test styles and formats have become more widely used than others. Below is a list of those formats of test items that are widely used by educators and test developers to construct paper or computer-based tests. As a result, these tests may consist of only one type of test item format (e.g., multiple-choice test, essay test) or may have a combination of different test item formats (e.g., a test that has multiple-choice and essay items).
There are several reasons to using multiple-choice questions in tests. In terms of administration, multiple-choice questions usually requires less time for test takers to answer, are easy to score and grade, provide greater coverage of material, allows for a wide range of difficulty, and can easily diagnose a test taker's difficulty with certain concepts. As an educational tool, multiple-choice items test many levels of learning as well as a test taker's ability to integrate information, and it provides feedback to the test taker about why distractors were wrong and why correct answers were right. Nevertheless, there are difficulties associated with the use of multiple-choice questions. In administrative terms, multiple-choice items that are effective usually take a great time to construct. As an educational tool, multiple-choice items do not allow test takers to demonstrate knowledge beyond the choices provided and may even encourage guessing or approximation due to the presence of at least one correct answer. For instance, a test taker might not work out explicitly that , but knowing that , they would choose an answer close to 48. Moreover, test takers may misinterpret these items and in the process, perceive these items to be tricky or picky. Finally, multiple-choice items do not test a test taker's attitudes towards learning because correct responses can be easily faked.
The difficulties with essay items are primarily administrative: for example, test takers require adequate time to be able to compose their answers. When these questions are answered, the answers themselves are usually poorly written because test takers may not have time to organize and proofread their answers. In turn, it takes more time to score or grade these items. When these items are being scored or graded, the grading process itself becomes subjective as non-test related information may influence the process. Thus, considerable effort is required to minimize the subjectivity of the grading process. Finally, as an assessment tool, essay questions may potentially be unreliable in assessing the entire content of a subject matter.
Instructions to exam candidates rely on the use of command words, which direct the examinee to respond in a particular way, for example by describing or defining a concept, or comparing and contrasting two or more scenarios or events. Some command words require more insight or skill than others: for example, "analyse" and "synthesise" assess higher-level skills than "describe".NEBOSH, Guidance on command words used in learning outcomes and question papers – Diploma qualifications, version 5, June 2021, accessed 23 September 2023 More demanding command words usually attract greater mark weighting in the examination. In the UK, Ofqual maintains an official list of command words explaining their meaning.AQA, Command words, accessed 27 December 2018 The Welsh government's guidance on the use of command words advises that they should be used "consistently and correctly", but notes that some subjects have their own traditions and expectations in regard to candidates' responses,Welsh Government, Fair access by design, Guidance document 174/2015, issued June 2015, accessed 15 August 2020 and Cambridge Assessment notes that in some cases, subject-specific command words may be in used.Cambridge Assessment, Understanding Command Words, accessed 23 September 2023
Higher-level mathematical papers may include variations on true/false, where the candidate is given a statement and asked to verify its validity by direct proof or stating a counterexample.
Common tests include timed running or the multi-stage fitness test (commonly known as the "beep test"), and numbers of , sit-ups/, and pull-ups that the individual can perform. More specialised tests may be used to test ability to perform a particular job or role. Many gyms, private organisations and event organizers have their own fitness tests. Using military techniques developed by the British Army and modern test like Illinois Agility Run and Cooper Test.
Stop watch timing was common until recent years when hand timing had proven to be inaccurate and inconsistent. Electronic timing is the new standard in order to promote accuracy and consistency, and lessen bias.
An example is a behind-the-wheel driving test to obtain a driver's license. Rather than only answering simple multiple-choice items regarding the driving of an automobile, a student is required to actually drive one while being evaluated.
Performance tests are commonly used in workplace and professional applications, such as professional certification and licensure. When used for personnel selection, the tests might be referred to as a work sample. A licensure example would be cosmetologists being required to demonstrate a haircut or manicure on a live person. The Group–Bourdon test is one of a number of psychometrics tests which trainee train drivers in the UK are required to pass.
Some performance tests are simulations. For instance, the assessment to become certified as an ophthalmic technician includes two components, a multiple-choice examination and a computerized skill simulation. The examinee must demonstrate the ability to complete seven tasks commonly performed on the job, such as retinoscopy, that are simulated on a computer.
Prior to the examination period most students in the Commonwealth have a week or so of intense revision and study known as swotvac.
In the United Kingdom, most universities hold a single set of "Finals" at the end of the entire degree course. In Australia, the exam period varies, with high schools commonly assigning one or two weeks for final exams, but the university period—sometimes called "exam week" or just "exams"—may stretch to a maximum of three weeks.
Practice varies widely in the United States; "finals" or the "finals period" at the university level constitutes two or three weeks after the end of the academic term, but sometimes exams are administered in the last week of instruction. Some institutions designate a "study week" or "reading period" between the end of instruction and the beginning of finals, during which no examinations may be administered. Students at many institutions know the week before finals as "dead week." Most final exams incorporate the reading material that has been assigned throughout the term.
Though common in France tertiary institutions, final exams are not often assigned in French high schools. However, French high school students hoping to continue their studies at university level will sit a national exam, known as the Baccalauréat.
In some countries and locales that hold standardised exams, it is customary for schools to administer mock examinations, with formats modelling the real exam. Students from different schools are often seen exchanging mock papers as a means of test preparation.
The process of test construction has been aided in several ways. For one, many test developers were themselves students at one time, and therefore are able to modify or outright adopt questions from their previous tests. In some countries, book publishers often provide teaching packages that include test banks to university instructors who adopt their published books for their courses. These test banks may contain up to four thousand sample test questions that have been peer-reviewed and time-tested. The instructor who chooses to use this testbank would only have to select a fixed number of test questions from this test bank to construct a test.
As with test constructions, the time needed for a test taker to prepare for a test is dependent upon the frequency of the test, the test developer, and the significance of the test. In general, nonstandardized tests that are short, frequent, and do not constitute a major portion of the test taker's overall course grade or score do not require the test taker to spend much time preparing for the test. Conversely, nonstandardized tests that are long, infrequent, and do constitute a major portion of the test taker's overall course grade or score usually require the test taker to spend great amounts of time preparing for the test. To prepare for a nonstandardized test, test takers may rely upon their reference books, class or lecture notes, Internet, and past experience. Test takers may also use various learning aids to study for tests such as and mnemonics. Test takers may even hire tutors to coach them through the process so that they may increase the probability of obtaining a desired test grade or score. In countries such as the United Kingdom, demand for private tuition has increased significantly in recent years. Finally, test takers may rely upon past copies of a test from previous years or semesters to study for a future test. These past tests may be provided by a friend or a group that has copies of previous tests or by instructors and their institutions, or by the test provider (such as an examination board) itself.
Unlike a nonstandardized test, the time needed by test takers to prepare for standardized tests is less variable and usually considerable. This is because standardized tests are usually uniform in scope, format, and difficulty and often have important consequences with respect to a test taker's future such as a test taker's eligibility to attend a specific university program or to enter a desired profession. It is not unusual for test takers to prepare for standardized tests by relying upon commercially available books that provide in-depth coverage of the standardized test or compilations of previous tests (e.g., ten year series in Singapore). In many countries, test takers even enroll in test preparation centers or cram schools that provide extensive or supplementary instructions to test takers to help them better prepare for a standardized test. In Hong Kong, it has been suggested that the tutors running such centers are celebrities in their own right. This has led to private tuition being a popular career choice for new graduates in developed economies. Finally, in some countries, instructors and their institutions have also played a significant role in preparing test takers for a standardized test.
Several common methods have been employed to combat cheating. They include the use of multiple proctors or invigilators during a testing period to monitor test takers. Test developers may construct multiple variants of the same test to be administered to different test takers at the same time, or write tests with few multiple-choice options, based on the theory that fully worked answers are difficult to imitate. In some cases, instructors themselves may not administer their own tests but will leave the task to other instructors or invigilators, which may mean that the invigilators do not know the candidates, and thus some form of identification may be required. Finally, instructors or test providers may compare the answers of suspected cheaters on the test themselves to determine whether cheating did occur.
|
|