[EpiData-list] Random selection of records
epidata-list at lists.umanitoba.ca
epidata-list at lists.umanitoba.ca
Tue Mar 4 16:47:09 CST 2008
This is an excellent question!
1. For random sampling, say you want to get 20% of the records (1 in 5).
With a large number of records, do this:
gen i pick =ran(5) // pick will be integer of values 0 to 4
select pick=0 // choose those with pick=0 (approx 1 in 5)
savedata sample
You will not always get one fifth of the records, since pick is
determined at random.
For systematic sampling (every fifth record) do this:
gen i pick = recnumber - 5*(recnumber div 5) // effectively pick =
recnumber mod 5
select pick = 0 // or 1,2,3,4 - it's your choice
savedata sample
The best way to randomly sample records, especially when there are not a
lot of records (<500 say) is this:
gen pick = rnd(1) // pick will be float between 0 and 1
sort pick
describe pick /q // this is only done to get the number of records into
$obs1
select recnumber <= ($obs1 div 5) // select the first fifth of records
after sorting
savedata sample
If you are just typing in the commands, there is no need to do the
describe command. You will know how many records to select after the sort.
I encourage everyone to explore the results variables. After any freq,
tab, means or other analytic command, type
var result
to see what results variables are created temporarily. These are VERY
useful. All of the functions are described in the HELP that comes with
Analysis. Jens and I spent some time getting the help file into shape,
since this is your key technical reference for Analysis.
This works because recnumber is always the record number in the current
record set and in its current order, after any sort and select.
2. In Analysis, LIST will show record numbers unless you include /no on
the command line - e.g.
LIST a b c d e /no
Jamie
Shavinder wrote:
>
> Dear Jamie,
> Thank you so much for the solution for the query “INCOMPATIBLE KEY VARIABLES”. It worked well.
>
> Can you please help in two other situation also ?
>
> 1. There is a file say “try.rec” Is it possible to read it and write another file with 5 randomly selected records or selecting every 4th record (as done in systematic sampling). The documentation of the commands such as SET RANDOM SEED and RANDOM SIMULATIONS is not clear in the help file. I am attaching a sample file try.zip.
>
> 2.What is the equivalent of SET LISTREC=ON/OFF (EPI6) in Epi Analysis ver 2.03 ?
>
More information about the EpiData-list
mailing list