I have a project where I need to do some things to all files whose names start with "sub" but not "subs". The former are files I receive, the latter are files I produce. When it occurred to me that their common file name root might cause problems, I rewrote my code to name the files formerly known as "subs" to something different but (to me) still descriptive, and I picked "snap".

The program that presents this little problem cleans up after itself. Every week, if it's Friday, it deletes whatever "sub", "subs", "snaps" or what have you files it finds that are older than n weeks. So, eventually, after the change above, all the "subs" files will disappear. There will only be "snap" files left in their place, and the ambiguity between "sub" and "subs" will no longer be an issue.

In the meantime, though, I need a piece of code that lists file names that start with "sub" but not with "subs", which is easy enough. But it needs to work even after the last "subs" file will have left the disk. I also want it to make use of things already declared elsewhere. I basically want to be free to forget I ever wrote it long after it outlived its usefulness but, should it turn up to be needed again for some reason, I want it to pick up where it left off.  So here's how the - (minus) operator of the extended macro function list can handle that:

Right now, in the directory in question, there are four strings that file names can start with: "rat", "sub", "subs" and "snap". The first two are file names I don't want to change. Every week there will be a new file starting with either string in that directory. The last two, like I mentioned, I have more control over. The set of file names that start with "snap" is empty now but will fill up over time, and the set of file names that start with "subs" is not empty yet, but it's headed that way. I don't care about the "rat" files for this particular purpose, but whatever is irrelevant should not have to be explicitly excluded. Here's the code:

local from=strlower("`c(pwd)'")
local whichfiles "rat sub subs snap"
foreach name in `whichfiles' {
    local `name'list: dir "`from'" files "`name'*"
//  local `name'count: list sizeof `name'list
//  di "`name' count is ``name'count'"
}
local sub sub                        // (1)
local dropthese: list whichfiles-sub // (2)
di "`dropthese'"
foreach name in `dropthese' {
//  di "`name'"
    local sublist: list sublist-`name'list
//  local subcount: list sizeof sublist
//  di "now sub count is `subcount'"
}

This assumes that your directory of interest is c(pwd) -- type creturn list in Stata if this makes no sense. Notice that you have to declare the string "sub" as a local in order for the list operator minus to work -- see lines commented (1) and (2).

The beauty of the minus operator is that it acts the way you would expect it to from set theory. It will leave `sublist' unchanged if:

a) ``name'list' is empty (as in the file names that start with "subs" after the last such file will have been deleted) or

b) ``name'list' is not a proper subset (as in the file names that start with "rat", right now) or

c) both (the case of the file names that start with "snap" right now; remember I don't yet have any such files).

So I don't need to declare a `whichfiles' list with strictly the file names of interest now -- "sub" and "subs". The minus operator will gracefully handle everything. If I happen to have saved a more expansive `whichfiles' list in some other spot, for some other reason, then I can just use it from there. The lines I commented out were useful for testing, but will be a nuisance when this is live. I could have equally well enclosed the whole thing in quietly {}. Come to think of it, that would have been more elegant.