Solved: Refresh token & network timeout

shurik · ‎01-06-2016

Regarding refresh tokens, is the following correct?

* There is only one valid refresh token at any given time

* Refresh token is re-ussed and the old token is invalidated every time the "refresh token request" is sent to Fitbit

If that's correct then we have the following situation

1. We sent a "refresh token" request. The request has timed out (we have a network timeout exception on our side). We never got the responce back due to a random network issue. But it looks like the request itself did reach Fitbit servers and tokens were re-issued on Fitbit side.

2. We retried with the old tokens and got "Refresh token invalid or expired".

Basically if refresh token request fails like described above there is no way to retry. The sync is performed by a background app without user intervention so we cannot "redirect to Fitbit for re-authentication" when this happens.

Does it make sense? Any advice how to mitigate this issue? This happened only once so far but because refresh token requests are being sent pretty often I think this will happen again.

JeremiahFitbit · ‎02-29-2016

Over the weekend, Fitbit's SoftLayer datacenter experienced some connectivity issues. With the information I currently have regarding the cause, failed refresh requests should not have resulted in a used access token. Re-requesting should have resulted in a successful refresh.

In *every* claim so far that we have investigated of a refresh token request failing, it has failed for a legitimate reason. If you would like us to investigate a specific case, please contact us privately with your client id, the user id, and the UTC timestamp of the failed refresh request. We can only look at failures within the last 3 days.

This week, we will be releasing a refresh token grace period. This means that an application will have 2 minutes after the refresh token is first used to make duplicate refresh requests and receive the same replacement refresh token. This will help with concurrency issues, which have been the cause of most cases we have investigated. Apps should set the refresh request timeout to 30 seconds or less, as to allow for additional requests within the 2 minute window.

View best answer in original post

JeremiahFitbit · ‎01-06-2016

Unfortunately, at this time, I don't have any recommendations for mitigating this scenario. We are working on a more forgiving exchange, but I don't have an ETA to share yet.

Do you know how long your request was open before the timeout?

shurik · ‎01-06-2016

Thanks for the quick responce and the confirmation.

We use HttpClient (.Net) with default timeout settings. I've just checked, looks like the default timeout is 100 seconds. So it was open 100 seconds before it timed out

jmitchell38488 · ‎01-06-2016

There could be an issue in your network through a firewall, or a filter, or even a router/switch. Have you tried looking at whether or not you have faulty hardware, or throughput issues? What about DNS settings and Dynamic DNS?

Is there any return output on the stream when you're connected? Anything at all? Have you tried making the requests using cURL or something like Postman?

Have you been able to figure out what the random network error is? It really depends on whether or not you host and manage the network yourself, or you rely on another party to do so.

Try making the request from an external computer and see if the same issue happens.

As for refreshing token issues, I encountered a similar issue (without network issues), here:

https://community.fitbit.com/t5/Web-API/Invalid-or-expired-refresh-tokens/m-p/1057036

shurik · ‎01-07-2016

The issue is that no matter what you do communications over the network are unreliable. On a big scale some refresh token requests will fail. And with the current setup it is not possible to retry.

As for that specific network issue, I do not know, and I think its not the point. We send maybe thousands requests per day. That day 3 requests timed out and one of them lead to the lost token issue.

jmitchell38488 · ‎01-07-2016

You still have to diagnose where the network issue is. It may well be on the FitBit issue, but if you aren't looking through your network logs, checking the network history and connections, then you won't know where the drop occured. If it happened on FitBit's end, then you'll need to wrap the refresh/request in another try/catch to capture the error and try to refresh again.

I would certainly drop the connection timeout a lot less to around a couple of seconds considering FitBit doesn't take long to return any data. You should also have a response code with the connection to detect whether it failed or the server returned a http error.

If you were able to connect and maintain the connection, it's quite possible that the connection was dropped by FitBit, or in between, but your socket remained open for the duration.

I would think that it's more likely to be an internal network issue, but that's just my guess until you're able to track where the drops occur. But as you say, out of thousands of requests per day, only a minute amount are dropping.

JeremiahFitbit · ‎01-07-2016

The current proposal is to allow a brief grace period where multiple requests with the expired access token's refresh token will result in the same new replacment access and refresh token. We're still determining how long would be reasonable and are considering under 2 minutes. In this scenario, you would want to set your HTTP library's timeout to something shorter to ensure you have time to retry the request.

aarondcoleman · ‎01-08-2016

@JeremiahFitbit I'm also a fearful of this scenario. We have to consider the worst case scenario, which is that a client system fails (goes down, network, whatever) in the middle of a token exchange and 2 minutes might be too short to revive a system or failover to another cluster. Since most of what we do is background, we also don't have the option to re-prompt end users for a re-auth. We'd just be totally locked out.

What about invalidating old refresh tokens on the first use of a new access token? I see other APIs taking this approach to avoid the lockout scenario.

Thanks

--Aaron

Using Fitbits in Research? Check out Fitabase --www.fitabase.com

JeremiahFitbit · ‎01-08-2016

@aarondcoleman wrote:

What about invalidating old refresh tokens on the first use of a new access token? I see other APIs taking this approach to avoid the lockout scenario.

That would be the most ideal and was the original design we considered. However, this is significantly more difficult to implement at our scale. It is still a consideration for the future.

aarondcoleman · ‎01-08-2016

@JeremiahFitbit Hmmm...

What about allowing some number of most recently exchanged tokens to be retried with the original refresh expiry? Like, 10-20 of these? Maybe even fewer. That way at least we can set our side to refresh these and stay under that threshold such that we don't try too many at a time where if there was a systemwide failure, we'd be ok to pick up where we left off.

On your end, you'd just have to keep a state object of X last exchanged tokens. I realize "just" is easier said than done at your scale, but at least this is tied to just the refresh token endpoint and not every single API request, which I'd guess is the scale challenge.

Thanks!

--Aaron

Using Fitbits in Research? Check out Fitabase --www.fitabase.com

jmitchell38488 · ‎01-08-2016

@aarondcoleman Having worked with, and built APIs, keeping stale data without relying on proper source identification, other than a single key, is a dangerous. While the suggestion is in good faith to help keep systems up and running in the event of a system failure, no one should ever build an API to support client system failure redundancy. It's far too complex and time consuming and introduces so many more security issues, than keeping it simple.

Since the refresh token request is essentially 1 request, why aren't you attempting to refresh the token in a try/catch loop until you get the refreshed token? Unless otherwise stated by FitBit as inappropriate, or not possible because of a refresh token timeout, this would be the simplest solution. As a developer who's built APIs, I can't think of any justification to maintain old refesh/access token combinations without security considerations, certainly where source identification isn't maintained or determined outside of keys.

aarondcoleman · ‎01-08-2016

@jmitchell38488 wrote:
@aarondcoleman Having worked with, and built APIs, keeping stale data without relying on proper source identification, other than a single key, is a dangerous.

@jmitchell38488 I don't think I'm introducing any additional security concern. My suggestion was to allow refresh tokens to be retried only if their expiration hadn't been used. I'm basically thinking we could do daily / pre-emptive batch refreshes ahead of expirations.

@jmitchell38488 wrote:
no one should ever build an API to support client system failure redundancy.

Respectfully, I disagree. Since HTTP is inherrently stateless, we do have to consider state in application/protocol design. Part of that is the possibility that a change in state isn't aligned between provider / consumer.

Thanks,

--Aaron

Using Fitbits in Research? Check out Fitabase --www.fitabase.com

Qualcee · ‎02-24-2016

Yeah, I have the same issue. In my post I've made some suggestions. https://community.fitbit.com/t5/Web-API/Refresh-token-no-good-after-fitbit-server-times-out/m-p/1200...

Something I noted in the post, is this: According to the oauth2 spec https://tools.ietf.org/html/rfc6749#page-10, giving a new refresh token is optional.

Another thing, and I've documentedt this with data, is that this is not a client failure. It was a fitbit api server not responding to my request in less than 30 seconds. Okay there is tons of network in between, so who knows where it actually happens. I noticed someone above was suggesting that fitbit shouldn't have to code for client problems and I agree, but it's not necessarily client issues only. I think fitbit has already acknowledge that this is a legitimate issue and they are figuring out what to do.

Since we provide the client key and secret, this whole needing a refresh token to get a new access token is really uncessary, since it is not what provides the security, the expiring access token does. If we are using SSL and are properly securing our client key and secret in our application, then security is not compromised. If we aren't doing those things, then we would be a bad client and we get what we deserve. In the case of my native mobile application, my server does all the work, the key / secret aren't even stored on the device or even the server application code, but instead are stored encrypted, and we only talk to fitbit over ssl.

Qualcee · ‎02-24-2016

Reposted here from https://community.fitbit.com/t5/Web-API/Refresh-token-no-good-after-fitbit-server-times-out/m-p/1200...

I've had some problems like a lot of people with refresh tokens. I also have done several OAuth2 implementations. I've gone so far as to logging every request and response to make sure I save the new refresh token after I make a refresh access token call.

In general it's working. Sometimes it just fails. The problem is that there is no room for error. When we are trying to collect data for a user that does not log in regularly, losing acess to their data is a major problem.

Yesterday I had 3 connection time out issues, 1 of which invalidated the refresh token.

Issue 1. I got this response from fitbit:

Response: HTTP/1.1 522 Origin Connection Time-out
    X-Frame-Options: SAMEORIGIN
    Date: Tue, 23 Feb 2016 23:57:49 GMT
    Transfer-Encoding: chunked
    CF-RAY: 2796dfacc3a02258-LAX
    Content-Type: text/html; charset=UTF-8
    Connection: keep-alive
    Server: cloudflare-nginx
    Pragma: no-cache
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<meta http-equiv="set-cookie" content="cf_use_ob=0; expires=Tue, 23-Feb-16 23:58:18 GMT; path=/">
<title>api.fitbit.com | 522: Connection timed out</title>

In this case, the refresh token still worked. I'm guessing this is because we never got to the actual fitbit application server.

Issue 2. My java socket timed out. "java.net.SocketTimeoutException: Read timed out", which means I never got a response from the fitbit server. In this case the refresh token still worked next time. Again I also assume that the request didn't make it to the fitbit application server.

Issue 3. Again java socket timed out. This time, the refresh token was no good on the next use. I'm assuming that this did in fact get to the fitbit application server, but never made it back to me in time.

My java socket timeout is set to 30 seconds. And now I can't collect data for this user. All of our systems are not perfect and we need a solution for this.

I think you need some flexibility on refresh tokens. It might be as simple as just adding an api that, since fitbit knows that the user hasn't revoked access, but I need a clean refresh token, I can call an api with my credentials, and the user id and in return get a clean refresh token. Maybe even include the refresh token I have, so you can check some history. Maybe just stop changing the refresh token. According to the oauth2 spec https://tools.ietf.org/html/rfc6749#page-10, giving a new refresh token is optional.

There are reasonable solutions which make this api access more flexible, yet still secure.

karkum1 · ‎02-24-2016

We're seeing several problems similar to refresh tokens as well. We're getting errors of this form:

"errorType":"invalid_grant","message":"Refresh token invalid: when upgrading users from oauth1 to ouath 2. We are also seeing similar problems with querying with just oauth2.

In addition, we are getting time out errors from the fitbit api using oauth2, which prevent us from reusing that refresh token.

There was some discussion about adding a grace period for refreshing tokens (from an above post by @JeremiahFitbit). Has there been any update on that?

Thanks you for your help!!

petro · ‎02-25-2016

This problem is persisting today - can we get an update from Fitbit please? This impacts our production operations.

Shalmezad · ‎02-29-2016

@JeremiahFitbit wrote:
Unfortunately, at this time, I don't have any recommendations for mitigating this scenario. We are working on a more forgiving exchange, but I don't have an ETA to share yet.

Any update on this?

JeremiahFitbit · ‎02-29-2016

Over the weekend, Fitbit's SoftLayer datacenter experienced some connectivity issues. With the information I currently have regarding the cause, failed refresh requests should not have resulted in a used access token. Re-requesting should have resulted in a successful refresh.

In *every* claim so far that we have investigated of a refresh token request failing, it has failed for a legitimate reason. If you would like us to investigate a specific case, please contact us privately with your client id, the user id, and the UTC timestamp of the failed refresh request. We can only look at failures within the last 3 days.

This week, we will be releasing a refresh token grace period. This means that an application will have 2 minutes after the refresh token is first used to make duplicate refresh requests and receive the same replacement refresh token. This will help with concurrency issues, which have been the cause of most cases we have investigated. Apps should set the refresh request timeout to 30 seconds or less, as to allow for additional requests within the 2 minute window.

Qualcee · ‎03-01-2016

I can confirm that over the weekend I did get 3 instances of a refresh token rejection, but a subsequent call worked fine, which pleasantly surpised me, and it wasn't even in the two minute window.

Today I will be implementing the retry, thanks!

Any chance this will push out the cutover date, we need some time to get this coded and pushed to test/ tested, then pushed to prod. Time will vary based on whatever processs the developer has to follow, like me at a big company with a full RFC process for scheduling production outages for a code push.

karkum1 · ‎03-01-2016

@JeremiahFitbit thank you so much!

We also noticed syncing problems over the weekend, but that seems to have been resolved this week. The refresh token grace period should help us out a lot! Thanks!

Join us on the Community Forums!

Community Guidelines

Learn the Basics

Join the Community!

Not finding your answer on the Community Forums?

Go to the Help Site

Contact Support

Refresh token & network timeout

shurik

JeremiahFitbit

JeremiahFitbit

shurik

jmitchell38488

shurik

jmitchell38488

JeremiahFitbit

aarondcoleman

JeremiahFitbit

aarondcoleman

jmitchell38488

aarondcoleman

Qualcee

Qualcee

karkum1

petro

Shalmezad

JeremiahFitbit

Qualcee

karkum1