Its easily fast enough in C whan you're running a 20Mhz clock - this scale is running at 74KHz clock rate or 13.5uS.. the actual C routine to sample the clock, determine the falling edge, read the data line and shift it into a 32bit register is 49 assembler instructions (7 lines of C) which takes 9.8uS. It may be that on another scale it will need optimising... although this compiler is pretty good but being C it tends to use temporary variables in RAM (mainly becuase I haven't yet worked out how to force it to use registers as variables)

Code:
 
//now in lead up to second data burst 
unsigned char count=24;
long data = 0;
while(count>0)
{
data >>= 1; //prepare for next bit
while(CK==1); //wait for clock to go low
if(DT==1) //sample data
 data |= 0x00800000;
while(CK==0); //wait for clock to go hi
count--;
}
//do stuff with data...
doubt I could read three scales on one PIC tho in C and I want to do that so the real one will use a 20pin device 18F1220 at 40Mhz...