Using Duplicity with Microsoft SharePoint/OneDrive for Business

For a long time I’ve been using Duplicity as my primary backup tool mostly because of its space efficient incremental backups. Since I use Linux both at work and at home this also applies to my corporate system.

Since Linux is not very well supported in my company a lot of set-up is done in do-it-yourself way. In a way I like that as this gives me freedom of choice as to the tools used. As backup is concerned Duplicity was the obvious choice. What I found slightly difficult is to find the right place to store my backups, as using a private USB HDD or Dropbox is obviously not the right choice for corporate data.

Duplicity comes with a wide choice of storage back-ends, starting from local filesystem, network file services like FTP, SFTP or WebDAV up to cloud storage systems like Amazon S3 or popular consumer systems such as Google Drive, Dropbox or OneDrive.

SharePoint/OneDrive for Business as a storage back-end

Initially I became interested in the last option as my company is using Microsoft Office 365 Enterprise which comes with 1TB of personal OneDrive storage. Unfortunately after some quick research I have learned that the corporate OneDrive has not much in common with the regular (private) OneDrive (besides the name of course). The corporate OneDrive storage is essentially a variant of Microsoft SharePoint with some more user-friendly frontend. This means that Duplicity’s personal OneDrive back-end will not work with the corporate OneDrive service.

SharePoint has a RESTful API to access it’s contents, which is not overly complicated in my opinion. I was thinking of developing a back-end module for Duplicity for it. Fortunately after some research I have discovered that besides its native REST API SharePoint also offers access over WebDAV, which is slightly less documented. Since Duplicity has a native WebDAV back-end this sounded like a way forward.

Authentication

The immediate problem that I faced was authentication. SharePoint offers a wide range of those starting from basic username/password to SAML-based ones. My company uses the latter, which Duplicity lack support for.

$ wget -S https://xxx.sharepoint.com/personal/john_doe_xxx_com/Documents/ --method=PROPFIND
--2017-02-24 09:29:03-- https://xxx.sharepoint.com/personal/john_doe_xxx_com/Documents/Backup/Duplicity/
Resolving xxx.sharepoint.com... 104.146.250.25
Connecting to xxx.sharepoint.com|104.146.250.25|:443... connected.
HTTP request sent, awaiting response...
 HTTP/1.1 403 FORBIDDEN
 Content-Type: text/plain; charset=utf-8
 Server: Microsoft-IIS/8.5
 X-SharePointHealthScore: 0
 SPRequestGuid: 9b16d89d-9030-3000-b8ce-0084932e613c
 request-id: 9b16d89d-9030-3000-b8ce-0084932e613c
 X-Forms_Based_Auth_Required: https://xxx.sharepoint.com/_forms/default.aspx?ReturnUrl=/_layouts/15/error.aspx&Source=/personal/john_doe_xxx_com
 X-Forms_Based_Auth_Return_Url: https://xxx.sharepoint.com/_layouts/15/error.aspx
 X-MSDAVEXT_Error: 917656; Access+denied.+Before+opening+files+in+this+location%2c+you+must+first+browse+to+the+web+site+and+select+the+option+to+login+automatically.
 X-IDCRL_AUTH_PARAMS_V1: IDCRL Type="BPOSIDCRL", EndPoint="/personal/john_doe_xxx_com/_vti_bin/idcrl.svc/", RootDomain="sharepoint.com", Policy="MBI"
 X-Powered-By: ASP.NET
 MicrosoftSharePointTeamServices: 16.0.0.6216
 X-Content-Type-Options: nosniff
 X-MS-InvokeApp: 1; RequireReadOnly
 P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
 Date: Fri, 24 Feb 2017 08:29:02 GMT
 Content-Length: 13
2017-02-24 09:29:03 ERROR 403: FORBIDDEN.

This type of authentication requires a set of cookies to be passed. The basic idea is that the browser should open the authentication site, which will ask for credentials, such as username, password or PIN/OTP. Once the user is authenticated a ticket will be generated and set as a cookie and a redirect is issued to the original site, which now grants access based on the cookie.

The cookie can be obtained by visiting the site mentioned in the X-Forms_Based_Auth_Requied header. However there is another way to obtain the cookie, which doesn’t involve displaying a webpage. You need send a SOAP request to https://login.microsoftonline.com/extSTS.srf containing your username and password. Upon successful completion the response will contain a token that has to be passed to your SharePoint site login service, which is available using the URL http://your-sharepoint-site.com/_forms/default.aspx?wa=wsignin1.0
. If everything checks out the response will contain some HTML, which can be safely ignored. The most important are two cookies set in return: FedAuth and rtFA. These can be used to authenticate with the SharePoint site from now on, including the native REST API and WebDAV.

Automating authentication

In order to automate the above authentication tasks I’ve written a Python script that authenticates against the Microsoft login site and retrieves the necessary cookies from SharePoint.

#!/usr/bin/python

from __future__ import print_function

try:
    from http.cookiejar import CookieJar
except ImportError:
    from cookielib import CookieJar

try:
    from urllib.error import URLError
except ImportError:
    from urllib2 import URLError

try:
    from urllib.parse import urlparse
except ImportError:
    from urlparse import urlparse

try:
    from urllib.request import urlopen, build_opener, HTTPCookieProcessor, Request
except ImportError:
    from urllib2 import urlopen, build_opener, HTTPCookieProcessor, Request

import sys
import xml.etree.ElementTree as ET


authXml = """<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope"
xmlns:a="http://www.w3.org/2005/08/addressing"
xmlns:u="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd">
<s:Header>
<a:Action s:mustUnderstand="1">http://schemas.xmlsoap.org/ws/2005/02/trust/RST/Issue</a:Action>
<a:ReplyTo>
<a:Address>http://www.w3.org/2005/08/addressing/anonymous</a:Address>
</a:ReplyTo>
<a:To s:mustUnderstand="1">https://login.microsoftonline.com/extSTS.srf</a:To>
<o:Security s:mustUnderstand="1"
xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd">
<o:UsernameToken>
<o:Username>{0}</o:Username>
<o:Password>{1}</o:Password>
</o:UsernameToken>
</o:Security>
</s:Header>
<s:Body>
<t:RequestSecurityToken xmlns:t="http://schemas.xmlsoap.org/ws/2005/02/trust">
<wsp:AppliesTo xmlns:wsp="http://schemas.xmlsoap.org/ws/2004/09/policy">
<a:EndpointReference>
<a:Address>{2}</a:Address>
</a:EndpointReference>
</wsp:AppliesTo>
<t:KeyType>http://schemas.xmlsoap.org/ws/2005/05/identity/NoProofKey</t:KeyType>
<t:RequestType>http://schemas.xmlsoap.org/ws/2005/02/trust/Issue</t:RequestType>
<t:TokenType>urn:oasis:names:tc:SAML:1.0:assertion</t:TokenType>
</t:RequestSecurityToken>
</s:Body>
</s:Envelope>
"""

def main():
    if len(sys.argv) < 3:
    print("Usage: get-sharepoint-auth-cookie.py endpointURL username password", file=sys.stderr)
    exit(1)

    endpoint = sys.argv[1]
    username = sys.argv[2]
    password = sys.argv[3]

    authReq = authXml.format(username, password, endpoint)
    try:
        request = urlopen("https://login.microsoftonline.com/extSTS.srf", authReq.encode('utf-8'))
    except URLError:
        print("Failed to send login request.", file=sys.stderr)
        exit(1)

    ns = {"soap": "http://www.w3.org/2003/05/soap-envelope",
          "wssec": "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" }
 
    authRespTree = ET.parse(request)
    authToken = None
    fault = authRespTree.find(".//soap:Fault", ns)
    if fault is not None:
        reason = fault.find("soap:Reason/soap:Text", ns)
        if reason is not None:
            reason = reason.text
        else:
            reason = "*Unknown reason*"
        print("Railed to retrieve authentication token: {}".format(reason))
        exit(1)

    tokenElm = authRespTree.find(".//wssec:BinarySecurityToken", ns)
    if tokenElm is None:
        print("Failed to retrieve authentication token.", file=sys.stderr)
        exit(1)
    else:
        authToken = tokenElm.text

    endpointUrl = urlparse(endpoint)
    if endpointUrl.scheme not in ["http", "https"] or not endpointUrl.netloc:
        print("Invalid endpoint URL: {}".format(endpoint), file=sys.stderr)
        exit(1)

    cookiejar = CookieJar()
    opener = build_opener(HTTPCookieProcessor(cookiejar))
    try:
        request = Request("{0}://{1}/_forms/default.aspx?wa=wsignin1.0".format(
            endpointUrl.scheme, endpointUrl.netloc))
        response = opener.open(request, data=authToken.encode('utf-8'))
        cookieStr = ""
        cookiesFound = []
        for cookie in cookiejar:
            if cookie.name in ("FedAuth", "rtFa"):
                cookieStr += cookie.name + "=" + cookie.value + "; "
                cookiesFound.append(cookie.name)

        if "FedAuth" not in cookiesFound or "rtFa" not in cookiesFound:
            print("Incomplete cookies retrieved.", file=sys.stderr)
            exit(1)
        print(cookieStr)
    except URLError as x:
        print("Failed to login to SharePoint site: {}".format(x.reason))
        exit(1)

if __name__ == '__main__':
    main()

The complete script can also be downloaded from here as WordPress seems to mess up Python formatting.

The script accepts three arguments:

  • endpointURL – the URL to your personal SharePoint site (for ex. https://xxx.sharepoint.com/personal/john_doe_xxx_com/)
  • username – your SharePoint username – usually the company e-mail (for ex. john.doe@xxx.com)
  • password – your account password. If your company uses Azure Directory two-factor authentication you will need to create a dedicated application password. Otherwise you can try with your regular Active Directory password.

The script will print the string of cookies on the standard output. This string is ready to be put as a value of the Cookie HTTP header.

Whether the script works in your environment or not may depend on the actual authentication configuration. I’m sure that there are ways to block such access to SharePoint and some administrators may have chosen to do so.

Making Duplicty work with cookies

Unfortunately retrieving the cookies will not help with Duplicity as by default there is no way to pass them to the HTTP connection code. In order to do that a change in Duplicity code is needed. You need to edit the backends/webdavbackend.py file, which is responsible for the WebDAV backend and add a few lines (in bold):

class WebDAVBackend(duplicity.backend.Backend):
    [...]

    def __init__(self, parsed_url):
        duplicity.backend.Backend.__init__(self, parsed_url) 
        self.headers = {'Connection': 'keep-alive'} 
        auth_cookies = os.getenv('AUTH_COOKIES') 
        if auth_cookies is not None: 
            self.headers['Cookie'] = auth_cookies 
        self.parsed_url = parsed_url

I have chosen to pass the cookie in an environment variable called AUTH_COOKIES. Just set this variable to the output of the above Python script and it should work.

Getting it all together

All that’s left now is to pass the correct URL to Duplicity. In my case (personal folder on OneDrive for Business) the URL is: webdavs://user:pass@xxx.sharepoint.com/personal/john_doe_xxx_com/Documents/Backup/Duplicity/Hostname/

Note the user:pass string – if you don’t pass an explicit username and password to Duplicity it will ask for it on the command line. You can pass any strings you like – it doesn’t matter as the real authentication is based on the cookie.

15 thoughts on “Using Duplicity with Microsoft SharePoint/OneDrive for Business

  1. Hi Chriss and thanks for sharing this. I am trying to run the script with: python get-sharepoint-auth-cookie.py ***-my.sharepoint.com username password

    But unfortunately nothing happens. Any idea?

    Please note there are some issue with the code to make the python script working – probably something went wrong with pasting it online:

    1. Line 58: indentation
    2. Line 77, 79, 87: adding closing double quotes
    3. Line 102: the format() function is missing the opening bracket

    Thanks

    Like

    1. Hi, I could not find the problems in the script you mentioned. I have however uploaded the raw script and added a link to it into the post, so hopefully it should be free of errors now, as I use this script daily and it works fine.

      Like

  2. Thank you for this very useful guide and for sharing the auto authentication code. For some reason, I keep getting an Invalid Request error. What should the endpointurl format look like? I fed my sharepoints url but not sure if this is correct.
    Also the code has some errors to it on the way it is displayed. Look for the soap:Fault line for example. The HTML appears to be fine but I think the blog formatting messes it up. Maybe this alters also my authxml and that is why I get an invalid request?

    Like

  3. For my case I got it to work by removing the argument input:
    endpoint = “https://xxxx-my.sharepoint.com”
    username = “xxxx@xxx.xxx”
    password = “pass”
    then in a bash script
    export AUTH_COOKIES=”$(python onedriveauth.py)”
    and afterward
    duplicity -vi –ssl-no-check-certificate –full-if-older-than 6M /home/user/backup webdavs://user:pass@xxxx-my.sharepoint.com/personal/xxxx_xxx_xxx/Documents/Backups

    Like

  4. Wow. Thank you! I posted to the duplicity email list a while back about this and was told this couldn’t be done. This works absolutely perfectly for me.

    Mind if I post back to the duplicity list linking to this article? There seemed to be a bit of interest in the subject.

    Have you considered submitting the patch for webdavbackend.py to duplicity?

    Like

    1. Of course I don’t mind linking this article.

      As for the patch, mine is a bit of a short cut (i.e. hack) and to have this merged it would need to be done in a proper way. Unless of course duplicity folks accept it as is, which I’d be happy with too.

      Like

  5. Hi, thanks for creating this patch. I ran into a Problem. I choose a rather large Volumesize of 200MB, though a restore would require even a larger download und temporary space. This worked well until duplicity tried to upload the signatures, which in my case are much larger, around 1.3 GB. Sharepoint just gives an error 404. Looks to mee like they cap uploads in the area of 300MB und on the other side duplicity still won’t respect volumesize for signatures. A rather large bump in the way…

    Like

    1. This is unfortunately a known problem of SharePoint – the WebDAV API has a limit for single file size – around 300-400MB. Trying to upload/download larger files will likely fail in obscure ways.

      Handling larger files is possible using a dedicated REST API, but that of course would require a completely new backend implementation.

      Like

  6. Hey again. Been using your code for ages now (thanks again). About a month ago it stopped working, and I get “Authentication Failure” returned from the call to extSTS.srf.

    Double checked auth credentials, my account status etc, and all is well. Before I go wading into debugging (bit of a learning curve for me as I don’t speak SOAP, and know nothing about the microsoft systems you’re hooking into) thought I’d check with you if it was just me, or if Microsoft changed something that broke everybody?

    Like

  7. Hi! Thank your for this. Unfortunately i am facing the same error message as octofish9 above. I will try to look into this and see if I can fix it. It anybody else working on this?

    Like

  8. Excellent post. I was checking constantly this weblog
    and I’m inspired! Very useful information particularly
    the final part 🙂 I take care of such information a lot.

    I was looking for this particular information for a
    very lengthy time. Thanks and best of luck.

    Like

Leave a reply to micreabog Cancel reply