Collection of Data

Data: Meaning, Types and SourcesCensus and Sample MethodsMethods of Collecting Primary Data

Data: Meaning, Types and Sources

Any statistical study begins by gathering facts and figures — this is the collection of data, the first stage of a statistical investigation. Data means a set of numerical facts or information collected for a definite purpose, such as the marks of students or the daily wages of workers.

Before collecting data, the investigator must be clear about the purpose, scope, and units of measurement of the enquiry. Data are of two main types depending on who collected them:

  • Primary data — data collected for the first time, directly by the investigator, for a specific purpose. They are original and have not been used before (e.g. a student surveying classmates about their pocket money). Primary data are more reliable and suited to the purpose, but they take more time and money.
  • Secondary data — data that have already been collected by someone else and are used again. They are second-hand (e.g. figures taken from a newspaper, a government report or a website). Secondary data save time and cost, but they may not exactly fit the new purpose and their reliability must be checked.

The chief sources of secondary data are published sources (government reports, the Census of India, RBI bulletins, newspapers, journals, international bodies like the UN) and unpublished sources (private records, registers, research files). The golden rule: before using secondary data, check who collected it, how, when and for what purpose.

1
Worked Example
Example 1: What is the difference between primary and secondary data?
Solution

It depends on who collected them.

  • Primary data are collected for the first time by the investigator for a specific purpose (original).
  • Secondary data were already collected by someone else and are reused (second-hand).
2
Worked Example
Example 2: A student takes population figures from the Census of India report. Is this primary or secondary data?
Solution

The student did not collect it.

  • The Census collected it; the student is reusing it.
  • So it is secondary data.
3
Worked Example
Example 3: State one advantage and one disadvantage of secondary data.
Solution

Weigh cost against fit.

  • Advantage: saves time and money (already available).
  • Disadvantage: may not exactly suit the new purpose and reliability must be checked.

Key Points

    • Data = numerical facts collected for a purpose; collection is the first stage of a statistical investigation.
    • Primary data: collected first-hand for a specific purpose (original, reliable, costly).
    • Secondary data: already collected by others, reused (cheap, quick, must check reliability).
    • Secondary sources: published (Census, RBI, reports) and unpublished (private records).
✎ Quick Check — 2 questions0 / 2
Q1.Data collected for the first time by the investigator are called:
Explanation: Primary data are original, collected first-hand for a specific purpose.
Q2.Figures taken from a government report and reused are:
Explanation: Data already collected by others and reused are secondary data.

Census and Sample Methods

When collecting primary data about a group, the investigator must decide how much of the group to study. The whole group being studied is called the population (or universe), and each member of it is a unit. There are two methods:

  • Census method (complete enumeration)every single unit of the population is studied. For example, the Census of India counts every person in the country. This method is very accurate and gives complete information, but it is costly, slow and needs a lot of effort — practical only when the population is small or when total accuracy is essential.
  • Sample method — only a part (a sample) of the population is selected and studied, and the results are taken to represent the whole. For example, a TV-rating agency surveys a few thousand households, not every household. This method is cheaper, faster and needs less effort; it is the most common method in practice.

A good sample must be representative (it should mirror the whole population) and large enough. The main sampling techniques are: random sampling (every unit has an equal chance of being chosen, like a lottery — this avoids bias), stratified sampling (the population is divided into groups/strata and samples taken from each), and systematic sampling (every nth unit is chosen). The danger of sampling is sampling error — the sample may differ from the true population — which is reduced by choosing a proper, large, random sample.

1
Worked Example
Example 1: What is the difference between the census and the sample method?
Solution

It is about how much is studied.

  • Census: every unit of the population is studied (complete).
  • Sample: only a representative part is studied and used to represent the whole.
2
Worked Example
Example 2: Why is the sample method usually preferred over the census method?
Solution

Compare cost and speed.

  • The sample method is cheaper, faster and needs less effort.
  • The census is costly and slow.
3
Worked Example
Example 3: What is random sampling and why is it useful?
Solution

Every unit gets a fair chance.

  • In random sampling, every unit has an equal chance of being selected (like a lottery).
  • This avoids bias and makes the sample fair.

Key Points

    • Population/universe = whole group; unit = each member.
    • Census: study every unit (accurate but costly/slow; e.g. Census of India).
    • Sample: study a representative part (cheap, fast, common); must be representative + large.
    • Techniques: random, stratified, systematic; watch for sampling error.
✎ Quick Check — 2 questions0 / 2
Q1.Studying every single unit of the population is the:
Explanation: The census method studies every unit of the population.
Q2.In random sampling, every unit of the population has:
Explanation: Random sampling gives every unit an equal chance of being selected.

Methods of Collecting Primary Data

When primary data must be collected, the investigator can choose from several methods, depending on the purpose, area, time and money available:

  • Direct Personal Investigation — the investigator personally contacts and questions each informant face-to-face. It is accurate and reliable but suitable only for a small area; it is costly and time-consuming for large enquiries.
  • Indirect Oral Investigation — instead of the informants themselves, the investigator questions other people (witnesses) who know about them. Used when informants are unwilling or hard to reach (e.g. enquiry into drinking habits). A police investigation works this way.
  • Information through Correspondents — local agents or correspondents in different places collect and send information regularly. Newspapers gather news this way. It is cheap and wide-reaching but less accurate, as it depends on the correspondents.
  • Mailed Questionnaire — a list of questions (a questionnaire) is mailed/emailed to informants, who fill it in and return it. It covers a wide area cheaply, but only literate people can respond and many may not return it (low response rate).
  • Schedule Method — trained workers called enumerators carry the questionnaire (here called a schedule) to the informants and fill it in for them by asking the questions. This is how the Census of India is done; it suits large enquiries and even illiterate informants, but it is costly.

The key difference between a questionnaire and a schedule: a questionnaire is filled in by the informant himself, while a schedule is filled in by an enumerator on the informant's behalf. Whichever method is used, the questions must be clear, short, unambiguous and not personal or leading.

1
Worked Example
Example 1: In which method does the investigator personally question each informant face-to-face?
Solution

Direct contact with the informant.

  • Direct Personal Investigation.
  • Accurate but suited only to a small area.
2
Worked Example
Example 2: What is the difference between a questionnaire and a schedule?
Solution

It is about who fills it in.

  • A questionnaire is filled in by the informant himself.
  • A schedule is filled in by an enumerator on the informant's behalf.
3
Worked Example
Example 3: Which method is used to conduct the Census of India, and why is it suitable?
Solution

Trained enumerators visit everyone.

  • The schedule method (enumerators carry and fill the schedule).
  • It suits a large enquiry and works even with illiterate informants.

Key Points

    • Direct Personal (face-to-face, small area, accurate); Indirect Oral (question witnesses; police-style).
    • Correspondents (local agents; newspapers; cheap, less accurate).
    • Mailed Questionnaire (informant fills; wide & cheap; literate only, low response).
    • Schedule (enumerator fills; Census of India; large enquiries, even illiterate; costly).
    • Questionnaire = filled by informant; schedule = filled by enumerator.
✎ Quick Check — 2 questions0 / 2
Q1.A questionnaire is filled in by the ____, while a schedule is filled in by the ____.
Explanation: The informant fills a questionnaire; an enumerator fills a schedule.
Q2.The Census of India collects data using the:
Explanation: The Census uses the schedule method with trained enumerators.